Re: [RFC,PATCH 14/14] utrace core

2009-12-16 Thread Roland McGrath
 C does implicit casts from enum to integer types, right? So always using
 u32 here would not impose any extra typing on the user, or am I missing
 something subtle here?

No, that's right.  I had just been thinking of the documentation /
convenience issue.  I figured with u32 people might tend to think they
always had to write utrace_resume_action(action) like you do in
report_signal, or would want it to get the enum so e.g. you can write a
switch and have gcc give those warnings about covering all the enum cases.
But you have convinced me that the consistency of using u32 everywhere is
the what's really easiest to understand.

 I don't mind the sharing of the argument, it just looked odd to have the
 u32/unsigned int/enum thing intermixed, since you care about typing
 length (as good a criteria as any) I'd just be lazy and make everything
 u32 ;-)

That's good enough for me.

 As to the content, can't you accomplish the same thing by processing
 such exclusive parent registration before exposing the child in the
 pid-hash? Right before cgroup_fork_callback() in
 kernel/fork.c:copy_process() seems like the ideal site.

As Oleg mentioned, that would add some complications.  The task is not
really fully set up at that point and the clone might actually still fail.
The final stages where the clone can fail are necessarily inside some
important locks held while making the new task visible for lookup.

One of the key features of the utrace API is that callbacks are called in
uncomplicated contexts so you just don't have to worry about a lot of
strangeness and arcane constraints.  That means making such a callback
while holding any spin lock or suchlike is really out of the question.

So, either we have to make a callback where you suggest, before the task
really exists, or where we do now, after the task is visible to others.  An
unfinished task that doesn't yet have all its pointers in place, and still
might be freed before it could ever run, would add a whole lot of hair to
what the utrace API semantics would be.  If you get the callback that
early, then you can attach to it that early, and then you expect some
callbacks about it actually failing ever to exist.  And meanwhile, you
might have to know what you can and can't try to do with a task that is not
really set up yet.  It just seems like a really bad idea.

Hence, we are stuck with the post-clone callback being really post-clone so
that it's possible for other things in the system to see the new task and
try to utrace_attach it.  But, as I said, we are not actually relying on
having a way to rule that out at the utrace level today.  So I think we can
just take out this hack and reconsider it later when we have an active need.

 Best would be to fix up the utrace-ptrace implementation and get rid of
 those other kludges I think, not sure.. its all a bit involved and I'm
 not at all sure I'm fully aware of all the ptrace bits.

We haven't figured out all the best ways to clean up ptrace the rest of
the way yet.  We'd like to keep improving that incrementally after the
basics of utrace are in the tree.

  [...] Surely you are
  not suggesting that all these callbacks should be made with a spin lock
  held, because that would obviously be quite insane.
 
 Because there can be many engines attached? 

Because it's a callback API.  A callback is allowed to use mutex_lock(),
is expected to be preemptible, etc.  Having an interface where you let
somebody's function unwittingly run with spin locks held, preemption
disabled, and so forth, is just obviously a terrible interface.

 Or in case of utrace_reap() maybe push the spin_lock() into it?

I did a restructuring to make that possible.  IMHO it makes the control
flow a bit less clear.  But it came out a bit smaller in the text, so
I'll go with it.

 I'm well aware that ptrace had some quirky bits in, and this might well
 be one of the crazier parts of it, but to the un-initiated reviewer (me)
 this could have done with a comment explaining exactly why this
 particular site is not worth properly abstracting etc..

We'll add more comments there.

 Not if the comment right above the WARN_ON() says that its a clueless
 caller.. but if you're really worried about it, use something like:
 
 WARN(cond, Dumb-ass caller\n);

Oh, that's much better.  I've made all the WARN_ON's give some text.


Thanks,
Roland



Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Peter Zijlstra
On Sun, 2009-12-13 at 16:25 -0800, Roland McGrath wrote:
 I'm sorry for the delay.  I'm picking up with responding to the parts of
 your review that I did not include in my first reply.  I appreciate very
 much the discussion you've had with Oleg about the issues that I did not
 address myself.  I look forward to your replies to my comments in that
 first reply, which we have yet to see.

Yeah, no worries, I'm not too quick these days myself..

  Seems inconsistent on u32 vs enum utrace_resume_action.
  
  Why force enum utrace_resume_action into a u32?
 
 The first argument to almost all callbacks (all the ones made when the
 task is alive) is called action and it's consistent that its low bits
 contain an enum utrace_resume_action.  The argument is u32 action in
 the report_signal and report_syscall_entry callbacks, where it combines
 an enum utrace_resume_action with an enum utrace_{signal,syscall}_action
 (respectively).  It would indeed be more consistent to use u32 action
 throughout, but it seemed nicer not to have engine writers always
 writing utrace_resume_action(action) for all the cases where there are
 no other bits in there.

C does implicit casts from enum to integer types, right? So always using
u32 here would not impose any extra typing on the user, or am I missing
something subtle here?

 The return value is a mirror of the u32 action argument.  In many
 calls, it has only an enum utrace_resume_action in it.  But in some it
 combines that with another enum utrace_*_action.  There I went for
 consistency (and less typing) in the return type, since it doesn't make
 any difference to how you have to write the code in your callback
 function.  The main reason I used u32 at all instead of unsigned int
 is just its shortness for less typing.
 
 I don't really mind changing these details to whatever people think is
 best.  The other people writing code to use the utrace API may have more
 opinions than I do.  I guess it could even be OK to make the return
 value and action argument always a plain enum utrace_resume_action,
 and use a second in/out enum utrace_{signal,syscall}_action *
 parameter in those particular calls.  But that does put some more
 register pressure and loads/stores into this API.

I don't mind the sharing of the argument, it just looked odd to have the
u32/unsigned int/enum thing intermixed, since you care about typing
length (as good a criteria as any) I'd just be lazy and make everything
u32 ;-)

  Quite gross this.. can't we key off the
  tracehoook_report_clone_complete() and use a wakeup there?
 
 Yes, we intended to clean this up at some point later.  But doing that
 is just a distraction and complication right now so we put it off.  For
 this case, however, I suppose we could just punt for the initial version.
 
 This code exists to support the special semantics of calling
 utrace_attach_task() from inside the parent's report_clone() callback.
 That guarantees the parent that it wins any race with any third thread
 calling utrace_attach_task().  

 This guarantees it will be first attacher
 in the callback order, but the general subject of callback order is
 already something we think we will want to revisit in the future after
 we have more experience with lots of different engines trying to work
 together.  

 It's most meaningful when using the UTRACE_ATTACH_EXCLUSIVE
 flag -- then you can use UTRACE_ATTACH_EXCLUSIVE|UTRACE_ATTACH_MATCH_OPS
 to synchronize with other threads trying to attach the same kind of
 engine to a task, and give special priority in that to the engine that
 attaches in report_clone() from tracing the parent.

I broke the text so it reads easier for me, it might be me, it might not
be proper English -- I'm not a native speaker -- but this is an example
of what you asked for below.

The thing is, your sentences are rather long, with lots of sub-parts and
similar. I find I need a break after digesting a few such things.

As to the content, can't you accomplish the same thing by processing
such exclusive parent registration before exposing the child in the
pid-hash? Right before cgroup_fork_callback() in
kernel/fork.c:copy_process() seems like the ideal site.

 Really I came up with this special semantics originally just with ptrace
 in mind.  ptrace is an engine that wants to exclude other tracer threads
 attaching asynchronously with the same kind of engine, and that wants to
 give special priority on a child to the parent's tracer (to implement
 PTRACE_O_TRACECLONE et al).  However, Oleg's current ptrace code still
 relies on the old hard-coded locking kludges to exclude the separate
 attachers and to magically privilege the auto-attach from the parent's
 tracer.  So we are not relying on this special semantics yet.
 
 We could just punt utrace_attach_delay() and remove the associated
 documentation about the special rights of report_clone() calling
 utrace_attach_task().  If we decide it helps clean things up later when
 we finish more 

Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Oleg Nesterov
On 12/13, Roland McGrath wrote:

  Its not entirely clear why we can check pending_attach outside of the
  utrace-lock and not be racy.

 I think it can be racy sometimes but that does not matter.
 Maybe Oleg can verify my logic here.  If it's right, he can
 add some comments to make it more clear.

 There is only a very limited sort of timeliness guarantee about
 getting your callbacks after utrace_attach_task()+utrace_set_events().
 If you know somehow that the task was definitely still in TASK_STOPPED
 or TASK_TRACED after utrace_attach_task() returned, then your engine
 gets all possible callbacks starting from when it resumes.  Otherwise,
 you can use utrace_control() with UTRACE_REPORT to ask to get some
 callback pretty soon.  The only callback events you are ever 100%
 guaranteed about (after success return from utrace_set_events()) are for
 DEATH and REAP, which have an unconditional lock-and-check before making
 engine callbacks.

Yes, I think this is correct. It is fine to miss -pending_attach == T,
and in any case the new attacher can come right after the check, even
if it was checked under utrace-lock.

It is important that the tracee can't miss, say, UTRACE_REPORT request
(as you already explained), and every time the tracee clears -resume
it calls splice_attaching().

 In the stopped cases, there are lots of locks and barriers and things
 after resuming.  (Oleg?)

Every time the tracee resumes after TASK_TRACED it uses utrace-lock
to synchronize with utrace_control/etc, it must see any changes.

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Oleg Nesterov
On 12/14, Peter Zijlstra wrote:

   Quite gross this.. can't we key off the
   tracehoook_report_clone_complete() and use a wakeup there?
 
  Yes, we intended to clean this up at some point later.  But doing that
  is just a distraction and complication right now so we put it off.  For
  this case, however, I suppose we could just punt for the initial version.
 
  This code exists to support the special semantics of calling
  utrace_attach_task() from inside the parent's report_clone() callback.
  That guarantees the parent that it wins any race with any third thread
  calling utrace_attach_task().

  This guarantees it will be first attacher
  in the callback order, but the general subject of callback order is
  already something we think we will want to revisit in the future after
  we have more experience with lots of different engines trying to work
  together.

  It's most meaningful when using the UTRACE_ATTACH_EXCLUSIVE
  flag -- then you can use UTRACE_ATTACH_EXCLUSIVE|UTRACE_ATTACH_MATCH_OPS
  to synchronize with other threads trying to attach the same kind of
  engine to a task, and give special priority in that to the engine that
  attaches in report_clone() from tracing the parent.

 As to the content, can't you accomplish the same thing by processing
 such exclusive parent registration before exposing the child in the
 pid-hash? Right before cgroup_fork_callback() in
 kernel/fork.c:copy_process() seems like the ideal site.

I thought about this too, but there are some complications. Just for
example, what if copy_process() fails after we already reported the
new child was attached. And, the new child is not properly initialized
yet, it doesn't even have the valid pid/real_parent/etc. Just imagine
a callback wants to record task_pid_vnr(). Or it decides to send a
fatal signal (even private) to the new tracee.

 Best would be to fix up the utrace-ptrace implementation and get rid of
 those other kludges I think, not sure.. its all a bit involved and I'm
 not at all sure I'm fully aware of all the ptrace bits.

It is not that utrace-ptrace is buggy. We try to preserve the current
semantics. Yes, we can move this kludge to ptrace layer, but I am not
sure about other UTRACE_ATTACH_EXCLUSIVE engines.

If we move this to ptrace, then we can probably mark the new child
as ptrace_attach() should fail, because we are going to auto-attach.
But in this case we need multiple hooks in do_fork() path again, like
the old ptrace does, while utrace needs only one.

 The major improvement this utrace stuff brings over the old ptrace is
 that it fully multiplexes the task tracing bits, however if the new bit
 isn't powerful enough to express all of the old bits with that then that
 seems to take the shine of the new bits.

Note that it would be easy to add another callback, and hide this
special case. But we should think twice before doing this.

 I'm well aware that ptrace had some quirky bits in, and this might well
 be one of the crazier parts of it, but to the un-initiated reviewer (me)
 this could have done with a comment explaining exactly why this
 particular site is not worth properly abstracting etc..

Well, agreed.

  In the pretty soon case, that means set_notify_resume:
  if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME))
  kick_process(task);
  i.e. IPI after test_and_set.  The test_and_set implies an smp_mb().
  So it should be the case that the set of utrace-pending_attach was seen
  before the set of TIF_NOTIFY_RESUME.

 Not exactly sure where the matching rmb() would come from in
 start_report() and friends -- task_utrace_struct() ?

I am a bit confused...

As Roland said, we don't have any timing guarantees, and we can't have.
Whatever we do, the tracee can miss, say, -report_syscall_entry() event
even if it does a system call after utrace_set_events(UTRACE_EVENT_SYSCALL),
this is fine.

But it shouldn't miss TIF_NOTIFY_RESUME/signal_pending/etc. I mean,
sooner or later it must hanlde the signal or TIF_NOTIFY_RESUME. And in
this case we can't miss -pending_attach.

IOW, we must ensure that if ever clear TIF_NOTIFY_RESUME we must not
miss -pending_attach, correct? and for this case we have mb() after
clear_thread_flag(). Perhaps instead we should add mb__after_clear_bit()
into arch/ hooks, but this needs a lot of arch-dependent changes.



And, btw, the usage of -replacement_session_keyring looks racy,
exactly because (without utracee) we done have the necessary barriers
on both sides.

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Oleg Nesterov
On 12/14, Oleg Nesterov wrote:

 IOW, we must ensure that if ever clear TIF_NOTIFY_RESUME we must not
 miss -pending_attach, correct? and for this case we have mb() after
 clear_thread_flag(). Perhaps instead we should add mb__after_clear_bit()
 into arch/ hooks, but this needs a lot of arch-dependent changes.

Cough. And I always read this rmb as mb. Even when I changed
the comment to explain that we need a barrier between clear bit
and read flags, I didn't notice it is in fact rmb...

I guess we need the trivial fix, Roland?

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Roland McGrath
 On 12/14, Oleg Nesterov wrote:
 
  IOW, we must ensure that if ever clear TIF_NOTIFY_RESUME we must not
  miss -pending_attach, correct? and for this case we have mb() after
  clear_thread_flag(). Perhaps instead we should add mb__after_clear_bit()
  into arch/ hooks, but this needs a lot of arch-dependent changes.

Since it's utrace/tracehook code that relies on the barrier I think it
makes sense to have it in tracehook_notify_resume() or utrace_resume().
The arch requirement is having done clear_thread_flag() beforehand, so the
arch-independent code can reasonably assume whatever semantics that is
guaranteed to have.

 Cough. And I always read this rmb as mb. Even when I changed
 the comment to explain that we need a barrier between clear bit
 and read flags, I didn't notice it is in fact rmb...
 
 I guess we need the trivial fix, Roland?

You're the barrier man, send me what changes it should get.


Thanks,
Roland



Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Roland McGrath
 Yes, I think this is correct. It is fine to miss -pending_attach == T,
 and in any case the new attacher can come right after the check, even
 if it was checked under utrace-lock.

Right.

 It is important that the tracee can't miss, say, UTRACE_REPORT request
 (as you already explained), and every time the tracee clears -resume
 it calls splice_attaching().

Right.

  In the stopped cases, there are lots of locks and barriers and things
  after resuming.  (Oleg?)
 
 Every time the tracee resumes after TASK_TRACED it uses utrace-lock
 to synchronize with utrace_control/etc, it must see any changes.

And TASK_STOPPED?

Please send me patches to add whatever comments would make all this clear
enough to Peter when reading the code.


Thanks,
Roland



Re: [RFC,PATCH 14/14] utrace core

2009-12-14 Thread Oleg Nesterov
On 12/14, Roland McGrath wrote:

   In the stopped cases, there are lots of locks and barriers and things
   after resuming.  (Oleg?)
 
  Every time the tracee resumes after TASK_TRACED it uses utrace-lock
  to synchronize with utrace_control/etc, it must see any changes.

 And TASK_STOPPED?

SIGCONT can wake up the TASK_STOPPED tracee. I don't think the tracer
should ever rely on TASK_STOPPED (utrace never does). If the tracer
needs the really stopped tracee we have UTRACE_STOP, and this means
TASK_TRACED.

But I am not sure I understand this part of discussion... In any case
the tracee should see any changes which were done before the wakeup.
But TASK_STOPPED can't guarantee the tracee won't be resumed until
the tracer wakes it up. Of course, TASK_TRACED can't prevent SIGKILL,
but in this case we should only care about the tracee can't resume
until we drop utrace-lock case.

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-12-13 Thread Roland McGrath
 All that seems to do is call -release() and kmem_cache_free()s the
 utrace_engine thing, why can't that be done with utrace-lock held?

Calling -release with a lock held is clearly insane, sorry.  It is true
that any engine-writer who does anything like utrace_* calls inside their
release callback is doing things the wrong way.  But guaranteeing that
simple mistakes result in spin-lock deadlocks just seems clearly wrong to
me.  A main point of the utrace API is to make it easier to work in this
space and help you avoid writing the pathological bugs.  Adding picayune
gotchas like this just does not help anyone.  No other API callback is made
holding some internal implementation lock, and making this one the sole
exception seems just obviously ill-advised on its face.  I can't really
imagine what a justification for such an obfuscation would be.

 But yeah, passing that list along does seem like a better solution.

So you find it cleaner to have each caller of utrace_reset take another
output parameter and be followed with the same exact source code duplicated
in each call site, than to have utrace_reset() do the unlock and then the
common code itself.  I guess there is no accounting for taste.  

We try not to get excited about trivia, so on matters like this one we will
do whatever the consensus of gate-keeping reviewers wants.  My patch to
implement your suggestion adds 13 lines of source and 134 bytes of compiled
text (x86-64).  Is that what you prefer?

I'll note that the code as it stands uses the __releases annotation for
sparse, as well as thoroughly documenting the locking details in comments.
I gather that clear explanation of the code is in your eyes no excuse for
ever writing code that requires one to actually read the comments.  I'm not
sure that attitude can ever be satisfied by any code that is nontrivial or
makes any attempts at optimization.


Thanks,
Roland



Re: [RFC,PATCH 14/14] utrace core

2009-12-13 Thread Roland McGrath
I'm sorry for the delay.  I'm picking up with responding to the parts of
your review that I did not include in my first reply.  I appreciate very
much the discussion you've had with Oleg about the issues that I did not
address myself.  I look forward to your replies to my comments in that
first reply, which we have yet to see.

 Seems inconsistent on u32 vs enum utrace_resume_action.
 
 Why force enum utrace_resume_action into a u32?

The first argument to almost all callbacks (all the ones made when the
task is alive) is called action and it's consistent that its low bits
contain an enum utrace_resume_action.  The argument is u32 action in
the report_signal and report_syscall_entry callbacks, where it combines
an enum utrace_resume_action with an enum utrace_{signal,syscall}_action
(respectively).  It would indeed be more consistent to use u32 action
throughout, but it seemed nicer not to have engine writers always
writing utrace_resume_action(action) for all the cases where there are
no other bits in there.

The return value is a mirror of the u32 action argument.  In many
calls, it has only an enum utrace_resume_action in it.  But in some it
combines that with another enum utrace_*_action.  There I went for
consistency (and less typing) in the return type, since it doesn't make
any difference to how you have to write the code in your callback
function.  The main reason I used u32 at all instead of unsigned int
is just its shortness for less typing.

I don't really mind changing these details to whatever people think is
best.  The other people writing code to use the utrace API may have more
opinions than I do.  I guess it could even be OK to make the return
value and action argument always a plain enum utrace_resume_action,
and use a second in/out enum utrace_{signal,syscall}_action *
parameter in those particular calls.  But that does put some more
register pressure and loads/stores into this API.

 Seems inconsistent in the bitfield type, also it feels like that 3 the 7
 and the enum should be more tightly integrated, maybe add:
 
 UTRACE_RESUME_MAX
 
 #define UTRACE_RESUME_BITS (ilog2(UTRACE_RESUME_MAX))
 #define UTRACE_RESUME_MASK ((1  UTRACE_RESUME_BITS) - 1)

Yes, that's a good cleanup.  Thanks.
(ilog2(7) is 2, so ilog2() + 1 is what you meant.)

  +static struct utrace_engine *matching_engine(
[...]
 The function does a search, suggesting the function name ought to have
 something like find or search in it.

Ok, I'll make it find_matching_engine.

 Quite gross this.. can't we key off the
 tracehoook_report_clone_complete() and use a wakeup there?

Yes, we intended to clean this up at some point later.  But doing that
is just a distraction and complication right now so we put it off.  For
this case, however, I suppose we could just punt for the initial version.

This code exists to support the special semantics of calling
utrace_attach_task() from inside the parent's report_clone() callback.
That guarantees the parent that it wins any race with any third thread
calling utrace_attach_task().  This guarantees it will be first attacher
in the callback order, but the general subject of callback order is
already something we think we will want to revisit in the future after
we have more experience with lots of different engines trying to work
together.  It's most meaningful when using the UTRACE_ATTACH_EXCLUSIVE
flag--then you can use UTRACE_ATTACH_EXCLUSIVE|UTRACE_ATTACH_MATCH_OPS
to synchronize with other threads trying to attach the same kind of
engine to a task, and give special priority in that to the engine that
attaches in report_clone() from tracing the parent.

Really I came up with this special semantics originally just with ptrace
in mind.  ptrace is an engine that wants to exclude other tracer threads
attaching asynchronously with the same kind of engine, and that wants to
give special priority on a child to the parent's tracer (to implement
PTRACE_O_TRACECLONE et al).  However, Oleg's current ptrace code still
relies on the old hard-coded locking kludges to exclude the separate
attachers and to magically privilege the auto-attach from the parent's
tracer.  So we are not relying on this special semantics yet.

We could just punt utrace_attach_delay() and remove the associated
documentation about the special rights of report_clone() calling
utrace_attach_task().  If we decide it helps clean things up later when
we finish more cleanup of the ptrace world, then we can add the fancy
semantics back in later.

 Does this really need the inline?

It has one caller and that call is unconditional.

 Asymmetric locking like this is really better not done, and looking at
 the callsites its really no bother to clean that up, arguably even makes
 them saner.

By assymetric you mean that utrace_reap releases a lock (as the
__releases annotation indicates).  As should be obvious from the code, the
unlock is done before the loop that does -report_reap callbacks and
utrace_engine_put() (which can make 

Re: [RFC,PATCH 14/14] utrace core

2009-12-08 Thread Peter Zijlstra
On Tue, 2009-12-08 at 16:04 +0100, Oleg Nesterov wrote:
 The
 problem is, this code was developed out-of-tree. That is why we would
 like to merge it asap, then do other changes which could be easily
 reviewed.
 
 Now, do you really mean we should throw out the working code, rewrite
 it avoiding these barriers, and resubmit? Sure, everything is possible.
 But this means another round of out-of-tree development with unclear
 results. 

Out-of-tree development is bad, it having taken lot of effort is no
excuse for merging ugly.

Now, I'm not against barriers at all, but code that is as barrier heavy
as this, with such bad comments and no clear indication it was actually
worth using so many barriers make me wonder.

Barriers aren't free either, and having multiple such things in quick
succession isn't nessecarily faster than a lock, but much less obvious.





Re: [RFC,PATCH 14/14] utrace core

2009-12-08 Thread Oleg Nesterov
On 12/08, Peter Zijlstra wrote:

 On Tue, 2009-12-08 at 16:04 +0100, Oleg Nesterov wrote:
 
  Well, this is subjective, but I don't agree that
 
  get_task_struct(task);
  task-utrace_flags = flags;
  spin_unlock(utrace-lock);
  put_task_struct(task);
 
  looks better.

 No, what I mean by assymetric locking is that utrace_reset() and
 utrace_reap() drop the utrace-lock where their caller acquired it,
 resulting in non-obvious like:

 utrace_control()
 {

   ...
   spin_lock(utrace-lock);

   ...

   if (reset)
 utrace_reset(utrace);
   else
 spin_unlock(utrace-lock);
 }

Agreed, the code like this never looks good.

 If you take a task ref you can write the much saner:

 utrace_control()
 {
   ...
   spin_lock(utrace-lock);
   ...
   if (reset)
 utrace_reset(utrace);

   spin_unlock(utrace-lock);
 }

No, get_task_struct() in utrace_reset() can't help, we should move
it into utrace_control() then. And in this case it becomes even more
subtle: it is needed because -utrace_flags may be cleared inside
utrace_reset() and after that utrace_control()-spin_unlock() becomes
unsafe.

Also. utrace_reset() drops utrace-lock to call put_detached_list()
lockless. If we want to avoid the assymetric locking, every caller
should pass struct list_head *detached to utrace_reset(), drop
utrace-lock, and call put_detached_list().

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-12-08 Thread Oleg Nesterov
On 12/08, Peter Zijlstra wrote:

 On Tue, 2009-12-08 at 16:04 +0100, Oleg Nesterov wrote:
  The
  problem is, this code was developed out-of-tree. That is why we would
  like to merge it asap, then do other changes which could be easily
  reviewed.
 
  Now, do you really mean we should throw out the working code, rewrite
  it avoiding these barriers, and resubmit? Sure, everything is possible.
  But this means another round of out-of-tree development with unclear
  results.

 Out-of-tree development is bad, it having taken lot of effort is no
 excuse for merging ugly.

 Now, I'm not against barriers at all, but code that is as barrier heavy
 as this, with such bad comments and no clear indication it was actually
 worth using so many barriers make me wonder.

Well. First of all, I agree at least partly. If you ask me, I feel
that in any case utrace needs more cleanups (in fact, like almost
any code in kernel) even if we forget about the barriers. In no
way utrace is finished or perfect. I think that Roland won't argue ;)

But. It would be much easier to do the futher development step by
step, patch by patch, which the changelogs, with the possibilty to
have the review. And it is much easier to change the code which is
already used by people. And, cleanups/simplifications are the most
hard part of the development.

However, of course I can't prove that the current code is good
enough for merging.

 Barriers aren't free either, and having multiple such things in quick
 succession isn't nessecarily faster than a lock, but much less obvious.

It is hardly possible to argue.

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-12-08 Thread Peter Zijlstra
On Tue, 2009-12-08 at 17:31 +0100, Oleg Nesterov wrote:

  If you take a task ref you can write the much saner:
 
  utrace_control()
  {
...
spin_lock(utrace-lock);
...
if (reset)
  utrace_reset(utrace);
 
spin_unlock(utrace-lock);
  }
 
 No, get_task_struct() in utrace_reset() can't help, we should move
 it into utrace_control() then. And in this case it becomes even more
 subtle: it is needed because -utrace_flags may be cleared inside
 utrace_reset() and after that utrace_control()-spin_unlock() becomes
 unsafe.

The task-utrace pointer is cleaned up on
free_task()-tracehook_free_task()-utrace_free_task(), so by holding a
ref on the task, we ensure -utrace stays around, and we can do
spin_unlock(), right?

 Also. utrace_reset() drops utrace-lock to call put_detached_list()
 lockless. If we want to avoid the assymetric locking, every caller
 should pass struct list_head *detached to utrace_reset(), drop
 utrace-lock, and call put_detached_list().

All that seems to do is call -release() and kmem_cache_free()s the
utrace_engine thing, why can't that be done with utrace-lock held?

But yeah, passing that list along does seem like a better solution.



Re: [RFC,PATCH 14/14] utrace core

2009-12-07 Thread Peter Zijlstra
On Tue, 2009-12-01 at 23:08 +0100, Oleg Nesterov wrote:

   @@ -560,6 +625,20 @@ static inline void tracehook_report_deat
   int signal, void *death_cookie,
   int group_dead)
{
   + /*
   +  * This barrier ensures that our caller's setting of
   +  * @task-exit_state precedes checking @task-utrace_flags here.
   +  * If utrace_set_events() was just called to enable
   +  * UTRACE_EVENT(DEATH), then we are obliged to call
   +  * utrace_report_death() and not miss it.  utrace_set_events()
   +  * uses tasklist_lock to synchronize enabling the bit with the
   +  * actual change to @task-exit_state, but we need this barrier
   +  * to be sure we see a flags change made just before our caller
   +  * took the tasklist_lock.
   +  */
   + smp_mb();
   + if (task_utrace_flags(task)  _UTRACE_DEATH_EVENTS)
   + utrace_report_death(task, death_cookie, group_dead, signal);
}
 
  I don't think its allowed to pair a mb with a lock-barrier, since the
  lock barriers are semi-permeable.
 
 Could you clarify?

According to memory-barriers.txt a mb can be paired with mb,wmb,rmb
depending on the situation.

For LOCK it states:

(1) LOCK operation implication:

 Memory operations issued after the LOCK will be completed after the LOCK
 operation has completed.

 Memory operations issued before the LOCK may be completed after the LOCK
 operation has completed.

Which is not something I can see making sense to pair with an mb.

So either the comment is confusing and its again referring to an UNLOCK
+LOCK pair, or there's something fishy

   @@ -589,10 +668,20 @@ static inline void set_notify_resume(str
 * asynchronously, this will be called again before we return to
 * user mode.
 *
   - * Called without locks.
   + * Called without locks.  However, on some machines this may be
   + * called with interrupts disabled.
 */
static inline void tracehook_notify_resume(struct pt_regs *regs)
{
   + struct task_struct *task = current;
   + /*
   +  * This pairs with the barrier implicit in set_notify_resume().
   +  * It ensures that we read the nonzero utrace_flags set before
   +  * set_notify_resume() was called by utrace setup.
   +  */
   + smp_rmb();
   + if (task_utrace_flags(task))
   + utrace_resume(task, regs);
}
 
  Sending an IPI implies the mb?
 
 Yes, but this has nothing to do with IPI. The caller, do_notify_resume(),
 does:
   clear_thread_flag(TIF_NOTIFY_RESUME);
   tracehook_notify_resume:
   if (task_utrace_flags(task))
   utrace_resume();
 
 We should not read task_utrace_flags() before we clear TIF_NOTIFY_RESUME.

Then this comment needs an update.. that wasn't at all clear to me.

   +static inline struct utrace *task_utrace_struct(struct task_struct *task)
   +{
   + struct utrace *utrace;
   +
   + /*
   +  * This barrier ensures that any prior load of task-utrace_flags
   +  * is ordered before this load of task-utrace.  We use those
   +  * utrace_flags checks in the hot path to decide to call into
   +  * the utrace code.  The first attach installs task-utrace before
   +  * setting task-utrace_flags nonzero, with a barrier between.
   +  * See utrace_task_alloc().
   +  */
   + smp_rmb();
   + utrace = task-utrace;
   +
   + smp_read_barrier_depends(); /* See utrace_task_alloc().  */
   + return utrace;
   +}
 
  I spot two barriers here, but only 1 over in utrace_task_alloc(), hmm?
 
 smp_read_barrier_depends() pairs with utrace_task_alloc()-wmb().
 
 smp_rmb() is needed for another reason. Suppose the code does:
 
   if (task_utrace_flags()  SOMETHING)
   do_something_with(task-utrace);
 
 if we race with utrace_attach_task(), we can see -utrace_flags != 0
 but task-utrace == NULL without rmb().

Still not clear what we pair with.. There is no obvious barrier in
utrace_attach_task() nor a comment referring to this site.

   +struct utrace_engine {
   +/* private: */
   + struct kref kref;
   + void (*release)(void *);
   + struct list_head entry;
   +
   +/* public: */
   + const struct utrace_engine_ops *ops;
   + void *data;
   +
   + unsigned long flags;
   +};
 
  Sorry, the kernel is written in C, not C++.
 
 Hmm. I almost never read the comments, but these 2 look very clear
 to me ;)

I see no point in adding comments like that at all.

   + * Most callbacks take an @action argument, giving the resume action
   + * chosen by other tracing engines.  All callbacks take an @engine
   + * argument, and a @task argument, which is always equal to @current.
 
  Given that some functions have a lot of arguments (depleting regparam),
  isn't it more expensive to push current on the stack than it is to
  simply read it again?
 
 Yes, perhaps. Only -report_reap() really needs @task, it may be
 !current.
 
   +struct utrace_engine_ops {
 
   + u32 (*report_signal)(u32 action,
   +  struct 

Re: [RFC,PATCH 14/14] utrace core

2009-12-02 Thread Oleg Nesterov
On 12/01, Peter Zijlstra wrote:

  +static inline __must_check int utrace_control_pid(
  +   struct pid *pid, struct utrace_engine *engine,
  +   enum utrace_resume_action action)
  +{
  +   /*
  +* We don't bother with rcu_read_lock() here to protect the
  +* task_struct pointer, because utrace_control will return
  +* -ESRCH without looking at that pointer if the engine is
  +* already detached.  A task_struct pointer can't die before
  +* all the engines are detached in release_task() first.
  +*/
  +   struct task_struct *task = pid_task(pid, PIDTYPE_PID);
  +   return unlikely(!task) ? -ESRCH : utrace_control(task, engine, action);
  +}

 Is that comment correct? Without rcu_read_lock() the pidhash can change
 under our feet and maybe cause funny things?

I already tried to answer, but I guess my email was not very clear. Let me
try again.

pid_task() by itself is safe, but yes, it is possible that utrace_control()
is called with target == NULL, or this task_task was already freed/reused.

utrace_control(target) path does not use target until it verifies it is
safe to dereference it.

get_utrace_lock() calls rcu_read_lock() and checks that engine-ops
is not cleared (NULL or utrace_detached_ops). If we see the valid -ops
under rcu_read_lock() it is safe to dereference target, even if we race
with release_task() we know that it has not passed utrace_release_task()
yet, and thus we know call_rcu(delayed_put_task_struct) was not yet
called _before_ we took rcu_read_lock().

If it is safe to dereference target, we can take utrace-lock. Once
we take this lock (and re-check engine-ops) the task can't go away
until we drop it, get_utrace_lock() drops rcu lock and returns with
utrace-lock held.

utrace_control() can safely play with target under utrace-lock.

  +   /*
  +* If this flag is still set it's because there was a signal
  +* handler setup done but no report_signal following it.  Clear
  +* the flag before we get to user so it doesn't confuse us later.
  +*/
  +   if (unlikely(utrace-signal_handler)) {
  +   spin_lock(utrace-lock);
  +   utrace-signal_handler = 0;
  +   spin_unlock(utrace-lock);
  +   }

 OK, so maybe you get to explain why this works..

Missed this part yesterday.

Well. -signal_handler is set by handle_signal() when the signal was
delivered to the tracee. This flag is checked by utrace_get_signal()
to detect the stepping. But we should not return to user-mode with
this flag set, that is why utrace_resume() clears it.

However. This reminds me we were going to try to simplify this logic,
I'll try to think about this.

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-12-01 Thread Peter Zijlstra
On Tue, 2009-11-24 at 21:02 +0100, Oleg Nesterov wrote:

 +  sect2 id=reaptitleFinal callbacks/title
 +  para
 +The functionreport_reap/function callback is always the final event
 +in the life cycle of a traced thread.  Tracing engines can use this as
 +the trigger to clean up their own data structures.  The
 +functionreport_death/function callback is always the penultimate
 +event a tracing engine might see; it's seen unless the thread was
 +already in the midst of dying when the engine attached.  Many tracing
 +engines will have no interest in when a parent reaps a dead process,
 +and nothing they want to do with a zombie thread once it dies; for
 +them, the functionreport_death/function callback is the natural
 +place to clean up data structures and detach.  To facilitate writing
 +such engines robustly, given the asynchrony of
 +constantSIGKILL/constant, and without error-prone manual
 +implementation of synchronization schemes, the
 +applicationutrace/application infrastructure provides some special
 +guarantees about the functionreport_death/function and
 +functionreport_reap/function callbacks.  It still takes some care
 +to be sure your tracing engine is robust to tear-down races, but these
 +rules make it reasonably straightforward and concise to handle a lot of
 +corner cases correctly.
 +  /para
 +  /sect2

This above text seems inconsistent. First you say report_reap() is the
final callback and should be used for cleanup, then you say
report_death() is the penultimate callback but might not always happen
and people would normally use that as cleanup.

If we cannot rely on report_death() then I would suggest to simply
recommend report_reap() as cleanup callback and leave it at that.

 +  para
 +There is nothing a kernel module can do to keep a structnamestruct
 +task_struct/structname alive outside of
 +functionrcu_read_lock/function. 

Sure there is, get_task_struct() comes to mind.

  When the task dies and is reaped
 +by its parent (or itself), that structure can be freed so that any
 +dangling pointers you have stored become invalid.
 +applicationutrace/application will not prevent this, but it can
 +help you detect it safely.  By definition, a task that has been reaped
 +has had all its engines detached.  All
 +applicationutrace/application calls can be safely called on a
 +detached engine if the caller holds a reference on that engine pointer,
 +even if the task pointer passed in the call is invalid.  All calls
 +return constant-ESRCH/constant for a detached engine, which tells
 +you that the task pointer you passed could be invalid now.  Since
 +functionutrace_control/function and
 +functionutrace_set_events/function do not block, you can call those
 +inside a functionrcu_read_lock/function section and be sure after
 +they don't return constant-ESRCH/constant that the task pointer is
 +still valid until functionrcu_read_unlock/function.  The
 +infrastructure never holds task references of its own.  

And here you imply its existence.

 Though neither
 +functionrcu_read_lock/function nor any other lock is held while
 +making a callback, it's always guaranteed that the structnamestruct
 +task_struct/structname and the structnamestruct
 +utrace_engine/structname passed as arguments remain valid
 +until the callback function returns.
 +  /para
 +
 +  para
 +The common means for safely holding task pointers that is available to
 +kernel modules is to use structnamestruct pid/structname, which
 +permits functionput_pid/function from kernel modules.  When using
 +that, the calls functionutrace_attach_pid/function,
 +functionutrace_control_pid/function,
 +functionutrace_set_events_pid/function, and
 +functionutrace_barrier_pid/function are available.
 +  /para

The above seems weird to me at best... why hold a pid around when you
can keep a ref on the task struct? A pid might get reused leaving you
with a completely different task than you thought you had.

 +  /sect2
 +
 +  sect2 id=reap-after-death
 +title
 +  Serialization of constantDEATH/constant and 
 constantREAP/constant
 +/title
 +para
 +  The second guarantee is the serialization of
 +  constantDEATH/constant and constantREAP/constant event
 +  callbacks for a given thread.  The actual reaping by the parent
 +  (functionrelease_task/function call) can occur simultaneously
 +  while the thread is still doing the final steps of dying, including
 +  the functionreport_death/function callback.  If a tracing engine
 +  has requested both constantDEATH/constant and
 +  constantREAP/constant event reports, it's guaranteed that the
 +  functionreport_reap/function callback will not be made until
 

Re: [RFC,PATCH 14/14] utrace core

2009-12-01 Thread Oleg Nesterov
On 12/01, Peter Zijlstra wrote:

 On Tue, 2009-11-24 at 21:02 +0100, Oleg Nesterov wrote:

  +  para
  +There is nothing a kernel module can do to keep a structnamestruct
  +task_struct/structname alive outside of
  +functionrcu_read_lock/function.

 Sure there is, get_task_struct() comes to mind.

it is not exported ;)

Peter, I skipped other comments about the documentation, I never read
it myself. Update: I skipped a lot more for today ;)

  @@ -351,6 +394,10 @@ static inline void tracehook_report_vfor
*/
   static inline void tracehook_prepare_release_task(struct task_struct *task)
   {
  +   /* see utrace_add_engine() about this barrier */
  +   smp_mb();
  +   if (task_utrace_flags(task))
  +   utrace_release_task(task);
   }

 OK, that seems to properly order -exit_state vs -utrace_flags,

 This site does:

  assign -state
  mb
  observe -utrace_flags

 and the other site does:

  assign -utrace_flags
  mb
  observe -exit_state

Yes, we hope.

  @@ -560,6 +625,20 @@ static inline void tracehook_report_deat
int signal, void *death_cookie,
int group_dead)
   {
  +   /*
  +* This barrier ensures that our caller's setting of
  +* @task-exit_state precedes checking @task-utrace_flags here.
  +* If utrace_set_events() was just called to enable
  +* UTRACE_EVENT(DEATH), then we are obliged to call
  +* utrace_report_death() and not miss it.  utrace_set_events()
  +* uses tasklist_lock to synchronize enabling the bit with the
  +* actual change to @task-exit_state, but we need this barrier
  +* to be sure we see a flags change made just before our caller
  +* took the tasklist_lock.
  +*/
  +   smp_mb();
  +   if (task_utrace_flags(task)  _UTRACE_DEATH_EVENTS)
  +   utrace_report_death(task, death_cookie, group_dead, signal);
   }

 I don't think its allowed to pair a mb with a lock-barrier, since the
 lock barriers are semi-permeable.

Could you clarify?

  @@ -589,10 +668,20 @@ static inline void set_notify_resume(str
* asynchronously, this will be called again before we return to
* user mode.
*
  - * Called without locks.
  + * Called without locks.  However, on some machines this may be
  + * called with interrupts disabled.
*/
   static inline void tracehook_notify_resume(struct pt_regs *regs)
   {
  +   struct task_struct *task = current;
  +   /*
  +* This pairs with the barrier implicit in set_notify_resume().
  +* It ensures that we read the nonzero utrace_flags set before
  +* set_notify_resume() was called by utrace setup.
  +*/
  +   smp_rmb();
  +   if (task_utrace_flags(task))
  +   utrace_resume(task, regs);
   }

 Sending an IPI implies the mb?

Yes, but this has nothing to do with IPI. The caller, do_notify_resume(),
does:
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume:
if (task_utrace_flags(task))
utrace_resume();

We should not read task_utrace_flags() before we clear TIF_NOTIFY_RESUME.

  +static inline struct utrace *task_utrace_struct(struct task_struct *task)
  +{
  +   struct utrace *utrace;
  +
  +   /*
  +* This barrier ensures that any prior load of task-utrace_flags
  +* is ordered before this load of task-utrace.  We use those
  +* utrace_flags checks in the hot path to decide to call into
  +* the utrace code.  The first attach installs task-utrace before
  +* setting task-utrace_flags nonzero, with a barrier between.
  +* See utrace_task_alloc().
  +*/
  +   smp_rmb();
  +   utrace = task-utrace;
  +
  +   smp_read_barrier_depends(); /* See utrace_task_alloc().  */
  +   return utrace;
  +}

 I spot two barriers here, but only 1 over in utrace_task_alloc(), hmm?

smp_read_barrier_depends() pairs with utrace_task_alloc()-wmb().

smp_rmb() is needed for another reason. Suppose the code does:

if (task_utrace_flags()  SOMETHING)
do_something_with(task-utrace);

if we race with utrace_attach_task(), we can see -utrace_flags != 0
but task-utrace == NULL without rmb().

  +struct utrace_engine {
  +/* private: */
  +   struct kref kref;
  +   void (*release)(void *);
  +   struct list_head entry;
  +
  +/* public: */
  +   const struct utrace_engine_ops *ops;
  +   void *data;
  +
  +   unsigned long flags;
  +};

 Sorry, the kernel is written in C, not C++.

Hmm. I almost never read the comments, but these 2 look very clear
to me ;)

  + * Most callbacks take an @action argument, giving the resume action
  + * chosen by other tracing engines.  All callbacks take an @engine
  + * argument, and a @task argument, which is always equal to @current.

 Given that some functions have a lot of arguments (depleting regparam),
 isn't it more expensive to push current on the stack than it is to
 simply read it again?

Yes, perhaps. Only -report_reap() really needs @task, it may be

Re: [RFC,PATCH 14/14] utrace core

2009-12-01 Thread Roland McGrath
 Could we just drop the tracehook layer if this finally merged
 and call the low level functions directly?

We can certainly do appropriate streamlining cleanups later, by all
means.  The original purpose of the tracehook layer is simply this:

A person hacking on core kernel code or arch code should not have to
think about all the innards of tracing (ptrace, utrace, or anything
else).  If he comes across a tracehook_* call, he can just read its
kernel-doc for explanation of its parameters, return value, what it
expects about its context (locking et al), and what the semantic
significance of making that particular call is.  If changes to the
core/arch code in question do not require changing any of those
aspects, then said person need not consider tracing issues further.
If a change to a function's calling interface or contextual
assumptions looks warranted, then he knows he should discuss the
details with some tracing-related folks (i.e. find tracehook.h in
MAINTAINERS and thus get to me and Oleg).

Likewise, a person hacking on tracing code should not have to think
about every corner of interaction with the core code or arch code.
Each tracehook_* call's kernel-doc comments say everything that
matters about how and where it's called.  If some of those details
are problematic for what we want to do inside the tracing code, then
we know we have to hash out the details with the maintainers of the
core or arch code that makes those calls.  Otherwise we can keep our
focus on tracing infrastructure without spending time getting lost
in arcane details of the arch or core kernel code.

These two things seem permanently worthwhile to me: that the core
and arch source code use simple function calls without open-coding
any innards of the tracing infrastructure; and that these functions
have clear and complete documentation about their calling interfaces
and context (locking et al).  I find it natural and convenient that
such calls have a common prefix that makes it obvious to any reader
of the core code what subsystem the call relates to.

Beyond those ideas, I certainly don't care at all what the names of
these functions are, what common prefix is used, or in which source
files those declarations and kernel-doc comments appear.


Thanks,
Roland



Re: [RFC,PATCH 14/14] utrace core

2009-12-01 Thread Roland McGrath
 This above text seems inconsistent. First you say report_reap() is the
 final callback and should be used for cleanup, then you say
 report_death() is the penultimate callback but might not always happen
 and people would normally use that as cleanup.

 If we cannot rely on report_death() then I would suggest to simply
 recommend report_reap() as cleanup callback and leave it at that.

I'm sorry the explanation was not clear to you.  I'll try to make it clear
now and then I'd appreciate your suggestions on how the documentation could
be worded better.  (I do appreciate your suggestion here but it's one based
on an idea that's not factual, so I don't think we should follow it.)

There is no unreliable aspect to it.  If you call utrace_set_events() to
ask for report_death() callbacks then you will get that callback for that
task--guaranteed--unless utrace_set_events() returned an error code that
tells you unambiguously that you could not get that callback.

What's true is that report_reap() is the last callback you can ever
get for a given task.  If you had asked for report_death() callbacks,
then you always get report_death() and if you've asked for both these
callbacks then report_death() is always before report_reap().  
(This is a special ordering guarantee, because in actuality the
release_task() call that constitutes reap can happen in the parent
simultaneously with the task's own exit path.)

The one situation in which you cannot get report_death() is when the task
was already dead when you called utrace_set_events().  In that case, it
gives an error code as I mentioned above.  Even in that situation, you can
still ask to enable just the report_reap() callback.  With either of these,
if you enabled it successfully, then you are guaranteed to get it.

When you get report_death() and are not interested in getting report_reap()
afterwards, then you can return UTRACE_DETACH from report_death().  If you
don't detach, then you will get report_reap() later too.  The documentation
mentions using report_death() to detach and clean up because many kinds of
uses will have no interest in report_reap() at all.  The only reason to get
a report_reap() callback is if you are interested in tracking when a task's
(real) parent reaps it with wait* calls, but usually people are only really
interested in a task until it dies.

  +  para
  +There is nothing a kernel module can do to keep a structnamestruct
  +task_struct/structname alive outside of
  +functionrcu_read_lock/function. 
 
 Sure there is, get_task_struct() comes to mind.

__put_task_struct() is not exported to modules and so put_task_struct()
cannot be used by modules.

  +still valid until functionrcu_read_unlock/function.  The
  +infrastructure never holds task references of its own.  
 
 And here you imply its existence.

[I take it this refers to the next quoted bit:]

  Though neither
  +functionrcu_read_lock/function nor any other lock is held while
  +making a callback, it's always guaranteed that the structnamestruct
  +task_struct/structname and the structnamestruct
  +utrace_engine/structname passed as arguments remain valid
  +until the callback function returns.

No, we do not imply that the utrace infrastructure holds any task
reference.  The current task_struct is always live while it's running
and until it's passed to release_task().  The task_struct passed to
callbacks is current.  That's all it means.  

The true facts are that utrace callbacks are all made in the current task
except for report_reap(), which is made at the start of release_task().
So the kernel's core logic is holding a task reference at all times that
utrace callbacks are made.  If you really think it is clearer to explain
that set of facts as utrace holds a reference, then please suggest the
exact wording you have in mind.

 The above seems weird to me at best... why hold a pid around when you
 can keep a ref on the task struct? A pid might get reused leaving you
 with a completely different task than you thought you had.

The *_pid interfaces are only there because put_pid() can be used by
modules while put_task_struct() cannot.  If we can make __put_task_struct()
an exported symbol again, I would see no reason for these *_pid calls.

 Right, except you cannot generally rely on this report_death() thing, so
 a trace engine needs to deal with the report_reap() thing anyway.

This is a false antecedent.  I hope the explanation above made this
subject clear to you.

  +  sect2 id=interlocktitleInterlock with final callbacks/title
[...]
 Hrmm, better mention this earlier, or at least refer to this when
 describing the above cleanup bits.

This explanation is somewhat long and has its own whole section so as to
be thoroughly explicit and clear.  Do you think there should just be a
cross reference here in the earlier mention of report_reap() and
report_death()?  Or do you think 

Re: [RFC,PATCH 14/14] utrace core

2009-11-25 Thread Andi Kleen
 This is subjective, but personally I disagree. Contrary, imho it
 is good that tracehook hides the (simple) details. I do not understand
 why the reader of, say, do_fork() should see the contents of
 tracehook_report_clone_complete(). This will complicate the understanding.

Someone who has to debug or review fork needs to know what's going on.

Yes they can find out by going through inlines, but that
just costs more time and distracts from the actual problem.

 Those people who want to understand/change fork() do not care about
 utrace/ptrace usually.
 
 And please note that it is much, much easier to change this code
 when it lives in tracehooks.h instead of sched.c/signal.c/etc.

The problem is that when you have to trace this code when something
goes wrong the extra layer just holds you up. For debugging usually
abstraction is a bad idea.

My experience is also that in general such extra abstraction layers
are frowned upon in Linux kernel code style. For example when new
vendor drivers are submitted for hardware like NICs etc,
they frequently tend to have all kinds of abstraction layers.
Typically the first step to linuxify them is to get rid of those. 

This makes the code more readable, shorter, better to debug and read.

Another classic example is: lock_foo() is frowned upon (Linus
tends to always complain about that), rather prefer spin_lock(foo_lock)
or mutex_lock(foo_lock) that makes it clear what's going on.

I don't see why this should be any different for utrace.

  Less code obfuscation.
 
  When it's a utrace call, call it a utrace call, not something else.
 
 Why do you think this is obfuscation? Well, we can rename these
 helpers, s/tracehook_/utrace_/, but I don't see how this can make
 the code more readable.

Because the inlines do not add anything to functionality and actually
hide what the code does, that is obfuscation. For you it might be obvious
because you've been hacking that code for quite some time, but for 
someone who is not in your position that's different.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.



Re: [RFC,PATCH 14/14] utrace core

2009-11-25 Thread Ingo Molnar

* Oleg Nesterov o...@redhat.com wrote:

 Much better. But in this case please note that most of tracehooks just 
 do:
 
   if (unlikely(task_utrace_flags(current)  SOME_EVENT))
   utrace_report_some_event();
 
 I really don't understand why we shouldn't have (trivial!) helpers for 
 this. (As for naming - personally I do not care at all ;)

We prefer helpers in most such cases - especially when in the normal 
case the helper has no side effects - as here. Then we want to compress 
all such reporting/callback as much as possible.

Using helpers to abstract away functionality is one of the basic 
elements of writing clean kernel code.

Ingo



Re: [RFC,PATCH 14/14] utrace core

2009-11-24 Thread Andi Kleen
Oleg Nesterov o...@redhat.com writes:

 From: Roland McGrath rol...@redhat.com

 This adds the utrace facility, a new modular interface in the kernel
 for implementing user thread tracing and debugging.  This fits on top
 of the tracehook_* layer, so the new code is well-isolated.

Could we just drop the tracehook layer if this finally merged
and call the low level functions directly?

It might have been reasonably early on when it was still out of tree,
but longer term when it's integrated having strange opaque hooks
like that just makes the coder harder to read and maintain.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.



Re: [RFC,PATCH 14/14] utrace core

2009-11-24 Thread Oleg Nesterov
On 11/24, Andi Kleen wrote:

 Oleg Nesterov o...@redhat.com writes:

  From: Roland McGrath rol...@redhat.com
 
  This adds the utrace facility, a new modular interface in the kernel
  for implementing user thread tracing and debugging.  This fits on top
  of the tracehook_* layer, so the new code is well-isolated.

 Could we just drop the tracehook layer if this finally merged
 and call the low level functions directly?

Not sure I understand. Tracehooks are trivial inline wrappers on
top utrace calls,

 It might have been reasonably early on when it was still out of tree,
 but longer term when it's integrated having strange opaque hooks
 like that just makes the coder harder to read and maintain.

Well, I don't think the code will be better if we remove tracehooks.

For example. tracehook_report_syscall_entry() has a lot of callers
in arch/, each callsite should be changed to do

if ((task_utrace_flags(current)  UTRACE_EVENT(SYSCALL_ENTRY)) 
utrace_report_syscall_entry(regs))
ret = -1; // this depends on machine

instead of simply calling tracehook_report_syscall_entry().

What is the point?

But again, perhaps I misunderstood you.

Oleg.



Re: [RFC,PATCH 14/14] utrace core

2009-11-24 Thread Andi Kleen
On Tue, Nov 24, 2009 at 09:41:52PM +0100, Oleg Nesterov wrote:
 On 11/24, Andi Kleen wrote:
 
  Oleg Nesterov o...@redhat.com writes:
 
   From: Roland McGrath rol...@redhat.com
  
   This adds the utrace facility, a new modular interface in the kernel
   for implementing user thread tracing and debugging.  This fits on top
   of the tracehook_* layer, so the new code is well-isolated.
 
  Could we just drop the tracehook layer if this finally merged
  and call the low level functions directly?
 
 Not sure I understand. Tracehooks are trivial inline wrappers on
 top utrace calls,

Yes that's the problem -- they are unnecessary obfuscation
when you can just call directly.

 
  It might have been reasonably early on when it was still out of tree,
  but longer term when it's integrated having strange opaque hooks
  like that just makes the coder harder to read and maintain.
 
 Well, I don't think the code will be better if we remove tracehooks.
 
 For example. tracehook_report_syscall_entry() has a lot of callers
 in arch/, each callsite should be changed to do
 
   if ((task_utrace_flags(current)  UTRACE_EVENT(SYSCALL_ENTRY)) 
   utrace_report_syscall_entry(regs))
   ret = -1; // this depends on machine
 
 instead of simply calling tracehook_report_syscall_entry().

That should be in the utrace code?

I don't have a problem with having common code somewhere,
just not a whole layer whose only purpose seems to be obfuscation.


 What is the point?

Less code obfuscation.

When it's a utrace call, call it a utrace call, not something else.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.



Re: [RFC,PATCH 14/14] utrace core

2009-11-24 Thread Frank Ch. Eigler
Hi -

On Tue, Nov 24, 2009 at 10:26:19PM +0100, Andi Kleen wrote:
 [...]
  For example. tracehook_report_syscall_entry() has a lot of callers
  in arch/, each callsite should be changed to do
  
  if ((task_utrace_flags(current)  UTRACE_EVENT(SYSCALL_ENTRY)) 
  utrace_report_syscall_entry(regs))
  ret = -1; // this depends on machine
  
  instead of simply calling tracehook_report_syscall_entry().
 
 That should be in the utrace code?
 
 I don't have a problem with having common code somewhere,
 just not a whole layer whose only purpose seems to be obfuscation.

One man's obfuscation is another man's abstraction.
Would you be satisfied if tracehook_ was renamed utracehook_?


- FChE