RCU, reference counts

2008-08-04 Thread Roland McGrath
Something that's always been an issue in the utrace interface is the
management of struct utrace_attached_engine.  It's tricky that you have to
use RCU and/or follow picayune rules in the attach+set_events-callbacks
sequence to be sure your engine pointer is valid.  From the beginning, I
expected to revisit this aspect of the interface and think it through again
at some point.  But I hadn't really thought about it in a long time now.

The discussion here on the asynchronous detach problem has helped me think
through the details that impinge on the concerns that had led me to write
it this way originally.  I'm grateful to David and Ananth for the detailed
exchange that helps air these things out.  Please keep it up!

We've been talking about all these corner cases with races, where I list
all the caveats for interactions with detach.  What these all come down to
is that your engine pointer might have been freed.  RCU can protect against
that, but it gets sticky if you want to block.  The common mechanism for
dealing with this kind of issue in interfaces is reference counted objects.
In the original implementation, I was resistant to using reference counts
on engines for a few reasons.

One reason was the idea that the interface could make it hard to leak.
I do still think it's good that e.g. the interface not encourage you to
hold task refs (i.e. you don't need them because you'll get a final
report_reap callback before the struct goes away).  But for the engine
structs, the option opposite leaks is to make the failure mode likely in
sloppy modules that they use a dangling engine pointer.  That is most
likely more dangerous than leaks, and especially likely to do something
bad to someone else's engine (since the pointer will be reused).  So
that objection to reference counts just doesn't hold any water.

Other concerns had to do with having lockless reporting loops, and
originally that seemed to require RCU for the structs anyway.  But that
part of the implementation has been straightened out since.  How we
manage the lockless list now is completely independent of RCU lists.
I've also objected to reference counts being touched for reporting,
and I still do.  But they don't have to be.

The callback synchronization thing is simpler when we have an engine
pointer that can be relied on, even when you block.  I'm guessing
everyone will be happy to see reference counts replace RCU in the
interface for dealing with struct utrace_attached_engine.

So here is what I'm thinking:

utrace_attach returns a ref for the caller.  You can use
utrace_engine_get(engine) and utrace_engine_put(engine) to manage the
pointer.  When you drop the last ref, the struct is freed.  There are no
release/destructor callbacks for that, the refcount is purely for the
purpose of keeping the pointer valid.  

An implicit ref is held while the engine is attached.  The engine
pointer can always be used during a callback because of this ref.  Your
callback can add a new ref even if you had dropped all other refs.  If a
callback returns UTRACE_DETACH, then this ref will be dropped (and might
be the last).

Since it's easy to keep the engine pointer on hand, we really only need
one synchronizing call.  So now there will be no *_sync calls after all.
utrace_set_events and utrace_control(,UTRACE_DETACH) can return
-EINPROGRESS as I described before.  This means the change of event
mask, or the detach, will take effect--but there might be a callback
already in progress or unavoidably on its way.  To block until that
callback is done, you can call utrace_barrier.

So now the safe detach procedure would be:

ret = utrace_control(task, engine, UTRACE_DETACH);
if (ret == -EINPROGRESS)
ret = utrace_barrier(task, engine);
if (ret == -ERESTARTSYS) /* interrupted wait, maybe callback pending */
return ret;
utrace_engine_put(engine);

I haven't really tested the new code at all.  But this took not much
work at all to change in the implementation.  If this is the right
direction, I'll give the code a modicum of testing and update the
documentation bits after another once-over on Monday.


Thanks,
Roland



Re: global tracing

2008-08-04 Thread David Smith
Roland McGrath wrote:
 We've mentioned global tracing.  I think it's time now to discuss it
 thoroughly and decide what we do or don't want to do.

...

 2. Why do we want utrace global tracing?

From a systemtap point of view, we'd certainly use global tracing.

...

 3. What would it look like?
 
 Global engines' callbacks all run after all per-task engine callbacks.
 (This could change in future.)

I guess in a perfect world callbacks would still be called in the
order they were attached.  But, if calling the global callbacks last
makes things easier, I think systemtap could handle it.

 I had originally planned to rule out SYSCALL events for global tracing.
 The reason is that this is not like other event checks where a simple
 flag gets checked cheaply.  Instead, it requires setting the low-level
 TIF_SYSCALL_TRACE on a thread, which makes it take a far slower path on
 system call entry and exit, and has a big impact on performance just
 from that alone.  Global tracing has to set this individually on every
 thread, and then pay that big overhead across the board.

If we had utrace memory map tracing (I believe it is on your TODO list),
 systemtap wouldn't use global (or even per-thread) SYSCALL events as much.

...

 I'd kind of prefer to exclude REAP events for global tracing.

Currently systemtap only uses DEATH events, so I don't have much of an
opinion there.

...

 4. So, what's the plan?
 
 I need folks who might use global tracing to answer these questions:
 
a. Do we want it?

Yes.  Systemtap currently does global tracing now, in a manner similar
to crash-suspend.c.  The code looks for global CLONE, EXEC, and DEATH
events, so systemtap knows when threads come and go.  Once systemtap
finds a process the user has told us he's interested in, it attaches
some additional per-thread engine(s).

In the future, Frank has mentioned trying to do global memory map
tracing, which would require global syscall tracing (or future global
memory map tracing).

b. Do we want it right now?

Yes.  If you need beta testers, let me know.

c. What justifies doing it in utrace (vs leaving it purely to
   tracepoints et al), to placate upstream critics?

 Please don't say, That would be nice; your reasons sound good.
 That just does not help at all.  The reasons in #2 above are ones I can
 think of, but I'm not arguing for them or for the feature.  If you want
 the feature, *you* will be justifying it to the upstream critics.  Let's
 here be as skeptical about adding the new complexity, before we decide on
 doing it, as our unsympathetic reviewers will be.

Global tracing would be *really* nice; your reasons sound *great*.
How's that? :-)

Seriously, your reasons a. (Event vocabulary clearly aligned with
utrace events), b. (Coordinated with per-task utrace callbacks), and
d. (Kernel already has checks here, so almost free) apply most
clearly to systemtap.  Systemtap doesn't currently change outcomes in a
callback, so reason c. doesn't apply much.  Systemtap is interested in
performance impacts and the a./b. advantages seem quite obvious to me.
Avoiding the complexities of manually attaching/detaching to every
thread in the system seems important also.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



Re: global tracing

2008-08-04 Thread Roland McGrath
  2. Why do we want utrace global tracing?
 
 From a systemtap point of view, we'd certainly use global tracing.

You're using tracepoints/markers too.  (You'll use anything, you minx.)
What we need is reasons for this to be a utrace feature.

 Global tracing would be *really* nice; your reasons sound *great*.
 How's that? :-)

Cursing me with loud praise!

 Seriously, your reasons a. (Event vocabulary clearly aligned with
 utrace events), b. (Coordinated with per-task utrace callbacks), and
 d. (Kernel already has checks here, so almost free) apply most
 clearly to systemtap.  Systemtap doesn't currently change outcomes in a
 callback, so reason c. doesn't apply much.  Systemtap is interested in
 performance impacts and the a./b. advantages seem quite obvious to me.

Ok.  Since a. is basically aesthetic, I think what would be concrete here
is to see how you'd use it in practice such that b. matters to you.

 Avoiding the complexities of manually attaching/detaching to every
 thread in the system seems important also.

That's a reason to have some kind of global tracing as opposed to none.
Sold.  It's not a reason to have utrace global tracing instead of only
tracepoints and markers.


Thanks,
Roland