RCU, reference counts
Something that's always been an issue in the utrace interface is the management of struct utrace_attached_engine. It's tricky that you have to use RCU and/or follow picayune rules in the attach+set_events-callbacks sequence to be sure your engine pointer is valid. From the beginning, I expected to revisit this aspect of the interface and think it through again at some point. But I hadn't really thought about it in a long time now. The discussion here on the asynchronous detach problem has helped me think through the details that impinge on the concerns that had led me to write it this way originally. I'm grateful to David and Ananth for the detailed exchange that helps air these things out. Please keep it up! We've been talking about all these corner cases with races, where I list all the caveats for interactions with detach. What these all come down to is that your engine pointer might have been freed. RCU can protect against that, but it gets sticky if you want to block. The common mechanism for dealing with this kind of issue in interfaces is reference counted objects. In the original implementation, I was resistant to using reference counts on engines for a few reasons. One reason was the idea that the interface could make it hard to leak. I do still think it's good that e.g. the interface not encourage you to hold task refs (i.e. you don't need them because you'll get a final report_reap callback before the struct goes away). But for the engine structs, the option opposite leaks is to make the failure mode likely in sloppy modules that they use a dangling engine pointer. That is most likely more dangerous than leaks, and especially likely to do something bad to someone else's engine (since the pointer will be reused). So that objection to reference counts just doesn't hold any water. Other concerns had to do with having lockless reporting loops, and originally that seemed to require RCU for the structs anyway. But that part of the implementation has been straightened out since. How we manage the lockless list now is completely independent of RCU lists. I've also objected to reference counts being touched for reporting, and I still do. But they don't have to be. The callback synchronization thing is simpler when we have an engine pointer that can be relied on, even when you block. I'm guessing everyone will be happy to see reference counts replace RCU in the interface for dealing with struct utrace_attached_engine. So here is what I'm thinking: utrace_attach returns a ref for the caller. You can use utrace_engine_get(engine) and utrace_engine_put(engine) to manage the pointer. When you drop the last ref, the struct is freed. There are no release/destructor callbacks for that, the refcount is purely for the purpose of keeping the pointer valid. An implicit ref is held while the engine is attached. The engine pointer can always be used during a callback because of this ref. Your callback can add a new ref even if you had dropped all other refs. If a callback returns UTRACE_DETACH, then this ref will be dropped (and might be the last). Since it's easy to keep the engine pointer on hand, we really only need one synchronizing call. So now there will be no *_sync calls after all. utrace_set_events and utrace_control(,UTRACE_DETACH) can return -EINPROGRESS as I described before. This means the change of event mask, or the detach, will take effect--but there might be a callback already in progress or unavoidably on its way. To block until that callback is done, you can call utrace_barrier. So now the safe detach procedure would be: ret = utrace_control(task, engine, UTRACE_DETACH); if (ret == -EINPROGRESS) ret = utrace_barrier(task, engine); if (ret == -ERESTARTSYS) /* interrupted wait, maybe callback pending */ return ret; utrace_engine_put(engine); I haven't really tested the new code at all. But this took not much work at all to change in the implementation. If this is the right direction, I'll give the code a modicum of testing and update the documentation bits after another once-over on Monday. Thanks, Roland
Re: global tracing
Roland McGrath wrote: We've mentioned global tracing. I think it's time now to discuss it thoroughly and decide what we do or don't want to do. ... 2. Why do we want utrace global tracing? From a systemtap point of view, we'd certainly use global tracing. ... 3. What would it look like? Global engines' callbacks all run after all per-task engine callbacks. (This could change in future.) I guess in a perfect world callbacks would still be called in the order they were attached. But, if calling the global callbacks last makes things easier, I think systemtap could handle it. I had originally planned to rule out SYSCALL events for global tracing. The reason is that this is not like other event checks where a simple flag gets checked cheaply. Instead, it requires setting the low-level TIF_SYSCALL_TRACE on a thread, which makes it take a far slower path on system call entry and exit, and has a big impact on performance just from that alone. Global tracing has to set this individually on every thread, and then pay that big overhead across the board. If we had utrace memory map tracing (I believe it is on your TODO list), systemtap wouldn't use global (or even per-thread) SYSCALL events as much. ... I'd kind of prefer to exclude REAP events for global tracing. Currently systemtap only uses DEATH events, so I don't have much of an opinion there. ... 4. So, what's the plan? I need folks who might use global tracing to answer these questions: a. Do we want it? Yes. Systemtap currently does global tracing now, in a manner similar to crash-suspend.c. The code looks for global CLONE, EXEC, and DEATH events, so systemtap knows when threads come and go. Once systemtap finds a process the user has told us he's interested in, it attaches some additional per-thread engine(s). In the future, Frank has mentioned trying to do global memory map tracing, which would require global syscall tracing (or future global memory map tracing). b. Do we want it right now? Yes. If you need beta testers, let me know. c. What justifies doing it in utrace (vs leaving it purely to tracepoints et al), to placate upstream critics? Please don't say, That would be nice; your reasons sound good. That just does not help at all. The reasons in #2 above are ones I can think of, but I'm not arguing for them or for the feature. If you want the feature, *you* will be justifying it to the upstream critics. Let's here be as skeptical about adding the new complexity, before we decide on doing it, as our unsympathetic reviewers will be. Global tracing would be *really* nice; your reasons sound *great*. How's that? :-) Seriously, your reasons a. (Event vocabulary clearly aligned with utrace events), b. (Coordinated with per-task utrace callbacks), and d. (Kernel already has checks here, so almost free) apply most clearly to systemtap. Systemtap doesn't currently change outcomes in a callback, so reason c. doesn't apply much. Systemtap is interested in performance impacts and the a./b. advantages seem quite obvious to me. Avoiding the complexities of manually attaching/detaching to every thread in the system seems important also. -- David Smith [EMAIL PROTECTED] Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Re: global tracing
2. Why do we want utrace global tracing? From a systemtap point of view, we'd certainly use global tracing. You're using tracepoints/markers too. (You'll use anything, you minx.) What we need is reasons for this to be a utrace feature. Global tracing would be *really* nice; your reasons sound *great*. How's that? :-) Cursing me with loud praise! Seriously, your reasons a. (Event vocabulary clearly aligned with utrace events), b. (Coordinated with per-task utrace callbacks), and d. (Kernel already has checks here, so almost free) apply most clearly to systemtap. Systemtap doesn't currently change outcomes in a callback, so reason c. doesn't apply much. Systemtap is interested in performance impacts and the a./b. advantages seem quite obvious to me. Ok. Since a. is basically aesthetic, I think what would be concrete here is to see how you'd use it in practice such that b. matters to you. Avoiding the complexities of manually attaching/detaching to every thread in the system seems important also. That's a reason to have some kind of global tracing as opposed to none. Sold. It's not a reason to have utrace global tracing instead of only tracepoints and markers. Thanks, Roland