Re: asynchronous detach

2008-08-02 Thread Ananth N Mavinakayanahalli
On Thu, Jul 31, 2008 at 03:38:02PM -0700, Roland McGrath wrote:
 
...

  I think this can be a good model to use for non-perturbing quiesce for
  cases as breakpoint insertion and removal or any applicaton text
  modification, where one needs to ensure no thread is executing in the
  vicinity of the said modification. So, even on an -EINPROGRESS on a
  utrace_control, if the thread is in kernel, manipulating application
  text is still safe, right?
 
 For this particular case, maybe safe enough.  I wouldn't like to
 encourage thinking of it as safe in general.  Whatever the kernel is
 doing might be touching user memory, or unmapping it or whatever.  For
 breakpoint insertion/removal, you are pretty much assuming nothing else
 would ever touch the text anyway.  You still need an interlock against
 munmap text + mmap reusing addrs in quick succession.  I suppose you
 could hold mmap_sem or something.
 
  Not sure if a similar model can be used to address the teardown races
  problem.
 
 Hmm.  I mentioned before having a non-synchronizing record of the
 callback in progress.  This lets UTRACE_DETACH return -EINPROGRESS,
 which is after the horse has left the barn, as it were.  There is a
 related extension of this idea I had vaguely in mind but didn't mention.
 A utrace_set_events call that disables event bits could also return
 -EINPROGRESS when some callback might be in progress.  As with detach,
 that means a zero return tells you all is well.  Unlike detach, if you
 get -EINPROGRESS from utrace_set_events, you still have the engine.

So, the intention is that -EINPROGRESS can be returned only on a
utrace_set_events(task, engine, 0), right?

Actually, I think it would also be useful to just have a
utrace_clear_events_sync() that encapsulates the above call. I can see
use for it outside of an asynchronous detach, for cases when one needs
to just turn tracing events off on a thread. (It is definitely not
recommended for debugger/probes usage with breakpoints already set in
user code, but would be useful for someone doing non-invasive syscall
and process lifetime tracing).
 
 If your callback never blocks, that is enough to know that it's safe to
 free your data, unload your module, etc.  (Or if your callback can block
 but what you want to synchronize with is just what it does before it
 blocks.  That's when you don't care about being able to unload the
 module, just juggling your own data structures.)
 
 For the general case of some unknown callback code that might include
 blocking, you could instead do:
 
   ret = utrace_set_events(task, engine, 0);
   while (ret == -EINPROGRESS) {
   ret = utrace_prepare_examine(task, engine, exam);
   if (ret == -EAGAIN)
   yield();
   ret = utrace_set_events(task, engine, 0);
   if (ret == -EINPROGRESS)
   yield();
   }
 
 That is: when it is blocked, check again if a callback still might be in
 progress, and loop.  Assuming your callback is well-behaved and only
 ever blocks for a short time (like kmalloc), this will return soon.

This can cetainly work!

 yield(); should actually be schedule_timeout_interruptible(1); probably,
 and it should be able to return -ERESTARTSYS.
 
 If this is a good plan, then we should encapsulate it in a new call
 utrace_set_events_sync() or something like that.  So then the safe
 detach procedure would be just:
 
   ret = utrace_set_events_sync(task, engine, UTRACE_EVENT(REAP));
   if (!ret)
   ret = utrace_control(task, engine, UTRACE_DETACH);
 
 This can only return zero or -ESRCH.  Zero says that all callbacks are
 definitely finished.  -ESRCH says that your report_reap callback is
 definitely happening (right now, modulo preemption).
 
 We could encapsulate that into utrace_detach_sync() too.

Yes, this does take care of the thread about to die case too.

Ananth



Re: asynchronous detach

2008-08-02 Thread Roland McGrath
 So, the intention is that -EINPROGRESS can be returned only on a
 utrace_set_events(task, engine, 0), right?

No, for any call when some bits had been enabled before and there hasn't
been an intervening safe point.  My idea was that it would return 0 only
when it's sure that no disabled callback might be made.  You'd get
-EINPROGRESS any time we aren't 100% positive of that guarantee.

 Actually, I think it would also be useful to just have a
 utrace_clear_events_sync() that encapsulates the above call. I can see
 use for it outside of an asynchronous detach, for cases when one needs
 to just turn tracing events off on a thread. (It is definitely not
 recommended for debugger/probes usage with breakpoints already set in
 user code, but would be useful for someone doing non-invasive syscall
 and process lifetime tracing).

I mentioned utrace_set_events_sync.

 Yes, this does take care of the thread about to die case too.

I've written code to implement most of this.  In working through it, I came
across a couple of other wrinkles I hadn't thought about before.  These
come up in thinking about the possible interactions with callbacks that
return UTRACE_DETACH.

Firstly, you can't call *_sync freely if any of your callbacks ever use
UTRACE_DETACH.  When any callback returns UTRACE_DETACH, immediately the
engine pointer is only kept alive by RCU.  You can't make a blocking call
with rcu_read_lock() held, so you can't be keeping it alive via RCU when
you all *_sync.

You can still use it, just not freely.  What that means is that you need
some other synchronization with your callbacks.  If your callback safely
updates a data structure saying already doing the detach, then you can
synchronize with that update and check it before calling utrace_detach_sync.
It takes a little thought to do that without deadlock, but it's doable.

There are already special cases for the report_death callback using
UTRACE_DETACH.  We can see that a death callback might be in progress just
from the task/utrace data structures, without actually looking at the
engine pointer passed in.  So we can relax that rule to allow using
UTRACE_DETACH in report_death without making *_sync unsafe.

The same sort of problems makes me unsure we can use interruptible blocks.

I'll think about this some more.


Thanks,
Roland



Re: asynchronous detach, global tracing

2008-07-31 Thread Frank Ch. Eigler
Roland McGrath [EMAIL PROTECTED] writes:

 [...]
 What the utrace interface has always said about this is, So don't do
 that.  [...]
 What I overlooked is that not just your data structures, but your
 callbacks too might be going away, i.e. unloading the kernel module.

I don't think the module-unloading case is so special.  If there exist
races involving utrace detach, then they will affect long-lasting
modules too that may want to do some utracing then some other stuff,
then perhaps return to utracing again.  In this scenario, the data too
is volatile or could be repurposed between utrace sessions.  Such a
module would need to know positively when no further callbacks will
arrive.

 [...]  For global engines' detach, one option is to offer no help
 with your own data structures but to solve the module-unload problem
 using the module refcount.  [...]

If having a per-cpu counter vector is sufficiently low weight for
utrace to update it around every callback, how about letting a utrace
engine specify an (optional?) percpu-integer vector?  Then, the utrace
client could use a similar synchronization algorithm as that of
module/refcount unloading to assure itself of a complete and final
utrace detach.  It could even opt to reuse the counters between
engines, or between utracing sessions, if it knows that its data/code
lifetimes can work with that.

- FChE