Re: How can I user @cast in the user-space's program or have other ways to aasign valus to the user-space pointer to a structure.

2012-03-27 Thread David Smith
Liu,

This is really a question for the systemtap list, not the utrace list.
Forwarding there.  Also see possible answer below.

On 03/26/2012 10:06 PM, Liu Tianhao wrote:

 I have a problem to cast a pointer to a structure in the user-space’s program.
 It always report “ERROR: kernel write fault at 0x00400675 (addr) near 
 identifier '@cast' at test.stp:3:8”.
 
 Compile the source file and execute the stap command.
 liuth@liuthivb:~/$ gcc -g -o test test.c
 liuth@liuthivb:~/$ sudo stap -w -vg test.stp -c ./test
 Pass 1: parsed user script and 81 library script(s) using 
 49344virt/22060res/2024shr kb, in 130usr/0sys/125real ms.
 Pass 2: analyzed script: 2 probe(s), 9 function(s), 0 embed(s), 0 global(s) 
 using 51992virt/23168res/2540shr kb, in 10usr/0sys/5real ms.
 Pass 3: using 
 cached 
 /home/liuth/.systemtap/cache/5c/stap_5c288dc4a44724d509924f222aedb626_90
 50.c
 Pass 4: using 
 cached 
 /home/liuth/.systemtap/cache/5c/stap_5c288dc4a44724d509924f222aedb626_90
 50.ko
 Pass 5: starting run.
 hello world
 call--call
 The value of a:[F] The value of b:[10]
 call--call
 ERROR: kernel write fault at 0x004005b5 (addr) near identifier 
 '@cast' 
 at test.stp:3:8
 Pass 5: run completed in 10usr/0sys/589real ms.
 Pass 5: run failed.  Try again with another '--vp 1' option.
 
 I have modified the test.stp as follows.
 probe process
 (/home/liuth/worksource/ddtv/tracedrv/java/DDTVConfig/test).function
 (funcStruct).call
 {
// compilation error
// @cast($pStruct, struct TestStruct,   test.h )-a = 31
//@cast($pStruct, struct TestStruct,   test.h )-b = 32
 
// ERROR: kernel write fault at 0x004005b5 (addr) near 
 identifier '@cast' at test.stp:3:8
//@cast($pStruct, struct TestStruct,   test.h )-a = 31
//@cast($pStruct, struct TestStruct,   test.h )-b = 32
 
//  ERROR: kernel read fault at 0x0020001f (addr) near 
 identifier '$pStruct' at test.stp:5:60
//@cast($pStruct, struct TestStruct,   test.h )-a = 31
//@cast($pStruct, struct TestStruct,   test.h )-b = 32
 
@cast($pStruct, struct TestStruct)-a = 31
@cast($pStruct, struct TestStruct)-b = 32
printf(The value of a:[%X] The value of b:[%X]\n, $pStruct-a, 
 $pStruct-b)
 }


Hmm, what happens when you just use the pointer directly, like this:

$pStruct-a = 31
$pStruct-b = 32


 The following are the program and the script.

 ---
 ---
 -
 Header file test.h:
 #include stdlib.h
 #include stdio.h
 typedef struct TestStruct
 {
int a;
int b;
 }ST_Test_Struct;
 
 //int  func(int a, int b, int c)
 int  func(ST_Test_Struct tmpStruct);
 int funcStruct(ST_Test_Struct* pStruct);
 
 source file test.c:
 #include test.h
 int  func(ST_Test_Struct tmpStruct)
 {
  return tmpStruct.a + tmpStruct.b;
 }
 
 int funcStruct(ST_Test_Struct* pStruct)
 {
 return pStruct-a + pStruct-b;
 }
 
 int main(int argc, char** argv)
 {
 ST_Test_Struct tmpStruct = { 1,2 };
 func(tmpStruct);
 funcStruct(tmpStruct);
 printf(hello world\n);
 return 0;
 }
 
 script  test.stp:
 probe process
 (/home/liuth/worksource/ddtv/tracedrv/java/DDTVConfig/test).function
 (funcStruct).call
 {
@cast($pStruct, struct TestStruct)-a = 31
@cast($pStruct, struct TestStruct)-b = 32
printf(The value of a:[%X] The value of b:[%X]\n, $pStruct-a, 
 $pStruct-b)
 }
 probe process
 (/home/liuth/worksource/ddtv/tracedrv/java/DDTVConfig/test).function
 (func).call
 {
printf(call--call\n)
$tmpStruct-a =15;
$tmpStruct-b =16;
printf(The value of a:[%X] The value of b:[%X]\n, $tmpStruct-a, 
 $tmpStruct-b)
printf(call--call\n)
 }
 
 



-- 
David Smith
dsm...@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework

2011-06-21 Thread David Smith
On 06/21/2011 10:25 AM, Oleg Nesterov wrote:
 OK, I won't argue. So we need to rework utrace/ptrace in 3.0, then we
 should do this again in 3.1. I'll try to do something.

I have a thought here, I'm not familiar enough with utrace internals to
know whether it is a good one or not.

Originally, implementing ptrace via utrace was optional - if I remember
correctly there was a define that turned it off and on.  One of the good
things about implementing ptrace-via-utrace is that they co-existed well
- you could ptrace a process that you were also using utrace on.  If I
remember correctly, during one of the utrace reviews, making
ptrace-via-utrace optional was requested to be removed.

Now with the recent ptrace changes, still implementing ptrace-via-utrace
will take some work.

OK, so here's my (hacky) idea:
(1) Forget ptrace-via-utrace.  Have utrace be a separate thing.  This
way the recent ptrace changes won't matter.
(2) But, what about ptrace co-existing well with utrace?  Make them
mutually exclusive - a ptraced-process can't be utraced and a
utraced-process can't be ptraced.

Assuming the above is a semi-reasonable idea, it might be a lot less
work than updating the ptrace-via-utrace code to handle the new ptrace
changes.

-- 
David Smith
dsm...@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



Re: resuming after stop at syscall_entry

2009-04-28 Thread David Smith
/.  Those can illustrate things with good
 comments, and also could be built verbatim to load multiple
 ones/instances in different orders and demonstrate what happens, etc.

The wiki would be fine - just somewhere that people could see this stuff.

 It would be nice to have folks like you and Renzo work up this text
 and/or examples.  What's needed is stuff that makes sense to you guys
 as users of the API, rather than what makes sense to me who has
 thought too much already about all this stuff.

We should probably just dump your email into the wiki.

-- 
David Smith
dsm...@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



Re: syscall tracing overheads: utrace vs. kprobes

2009-04-28 Thread David Smith
Frank Ch. Eigler wrote:
 Hi -
 
 In a few contexts, it comes up as to whether it is faster to probe
 process syscalls with kprobes or with something higher level such as
 utrace.  (There are other hypothetical options too (per-syscall
 tracepoints) that could be measured this way in the future.)  

These scenarios are a bit wrong:

 Now we compare these scenarios:
 
 # stap -e 'probe never {}' -t --vp 1 -c a.out
 
 Here, no actual probing occurs so we get a measurement of the plain
 uninstrumented run time of ten million close(2)s.

The above one is fine.

 # stap -e 'probe process.syscall {}' -t --vp 1 -c a.out
 
 Here, we intercept sys_close with a kprobe.  If the system is not too
 busy, we should pick up only the close(2)s coming from a.out, though a
 few close(2)'s executed by other processes may show up.
 
 # stap -e 'probe syscall.close {}' -t --vp 1 -c a.out
 
 Here, we intercept all a.out's syscalls with utrace.  Other processes
 are not affected at all, but other syscalls by a.out would be --
 though in our test, there are hardly any of those.

These 2 are swapped:  the 'process.syscall' probe is a utrace-based
probe and the 'syscall.close' probe is a kprobe-based probe.

Note that in the results, the description and probe types matched correctly.

 Some typical results on my 2.66GHz 2*Xeon5150 machine runnin Fedora 9 -
 2.6.27.12:
 
 never:  
 Pass 5: run completed in 740usr/3310sys/4155real ms.
 
 kprobe: 
 probe syscall.close (input:1:1), hits: 1028, cycles: 
 176min/202avg/3632max
 Pass 5: run completed in 750usr/9320sys/10193real ms.
 
 utrace: 
 probe process.syscall (input:1:1), hits: 1025, cycles: 
 176min/209avg/184392max
 Pass 5: run completed in 1670usr/6860sys/8645real ms.
 
 So utrace added 4.5 seconds, and kprobes added 6.0 seconds to the
 uninstrumented 4.1 second run time.  But wait: we should subtract the
 time taken by the probe handler itself: 200ish cycles at 2.66 GHz,
 which is about 0.75 seconds.  So the overheads are approximately:
 
 never: n/a
 kprobe: 5.2 seconds = 0.52 us per hit
 utrace: 3.6 seconds = 0.36 us per hit
 
 
 Note that these are microbenchmarks that represent an ideal case
 compared to a larger run, since they probably fit comfily inside
 caches.  They probably also undercount the probe handler's run time.

-- 
David Smith
dsm...@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



Re: resuming after stop at syscall_entry

2009-04-22 Thread David Smith
Roland McGrath wrote:

This processing makes sense I think.  It is a bit complicated of course,
but not unnecessarily so.

I'd like to ask you how this stuff would relate to systemtap (so I've
added the systemtap mailing list).  I've interspersed a few
comments/questions below.

... stuff deleted ...
 SYSCALL_ENTRY is unlike all other events.  Right after this callback
 loop is when the important user-visible stuff happens (the system call).
 So we stop immediately there as for the other two.  But, if another
 engine used UTRACE_STOP and maybe did something asynchronously, like
 modifying the syscall argument registers, you get no opportunity to see
 what happened.  Once all engines lift UTRACE_STOP, the system call runs.

... stuff deleted ...

 As explained above, the norm of interacting with other engines and their
 use of UTRACE_STOP is to use the final report.  When your callback's
 action argument includes UTRACE_STOP, you know an earlier engine might
 be fiddling before the thread resumes.  So, your callback can decide to
 return UTRACE_REPORT.  That ensures that some report_quiesce (or
 report_signal/UTRACE_SIGNAL_REPORT) callback will be made after the
 other engine lifts its UTRACE_STOP and before user mode.  At that point,
 you can see what user register values it might have installed, etc.  In
 all events but syscall entry, a final report_quiesce(0) serves this need.
 
 My proposal is to extend this resume report approach to the syscall
 entry case.  That is, after when some report_syscall_entry returned
 UTRACE_STOP so we've stopped, allow for a second reporting pass after
 we've been resumed, before running the system call.  You'd get this pass
 if someone used UTRACE_REPORT.  That is, in the first callback loop, one
 engine used UTRACE_STOP and another used UTRACE_REPORT.  Then when the
 first engine used utrace_control() to resume, there would be a second
 reporting pass because of the second engine's earlier request.  Or, even
 if there was just one engine, but it used UTRACE_STOP and then used
 utrace_control(UTRACE_REPORT) to resume, then it would get the second
 reporting pass.  If someone uses UTRACE_STOP+UTRACE_REPORT in that pass,
 there would be a third pass, etc.
 
 What I have in mind is that the second (and however many more) pass
 would just be another report_syscall_entry callback to everyone with
 UTRACE_EVENT(SYSCALL_ENTRY) set.  A flag bit in the action argument says
 this is a repeat notification.
 
 I think this strikes a decent balance of not adding more callbacks and
 more arguments to bloat the API in general, while imposing a fairly
 simple burden on engines to avoid getting confused by multiple calls.
 
 A tracing-only engine that just wants to see the syscall that is going
 to be done can just do:
 
   if (utrace_resume_action(action) == UTRACE_STOP)
   return UTRACE_REPORT;
 
 at the top of report_syscall_entry, so it just doesn't think about it
 until it thinks the call will go now through.  

Systemtap currently doesn't support changing syscall arguments, if it
does, obviously a few things would need to change.

But, I think systemtap would probably fall here - only see the syscall
that is actually going to be done.  So systemtap could possibly get
multiple callbacks for the same syscall, but only pay attention to the
last one, correct?

 Say an engine has a different agenda, just to see what syscall argument
 values came in from user mode before someone else changes them.  It does:
 
   if (action  UTRACE_SYSCALL_RESUMED)
   return UTRACE_RESUME;
 
 to ignore the additional callbacks that might come after somebody
 decided to stop and report.  It just does its work on the first one.
 
 Here comes Renzo again!  He wants to have two or three or nineteen
 layers of the first kind of Renzo engine: each one stops at syscall
 entry, then resumes after changing some registers.  He wants these to
 nest, meaning that after the outermost one stops, fiddles, and
 resumes, the next one in stops, looks at the register as fiddled by
 the outermost guy, fiddles in a different way, and resumes, and on and
 on.  Perhaps the first model (if last guy is stopping, punt to look
 again at resume report) works for that.  Or perhaps the engine also
 needs to keep track with its own state flag it sets whenever it does its
 work, and then resets in exit tracing to prepare for next time.

... stuff deleted ...

 So, even I can't write that much text and still think this interface
 choice is simple to understand.  But I kind of think it's around as
 simple as it can be for its mandates.  I'd appreciate any feedback.

This is understandable, but does hurt my head a *little* bit.  I think
if you put the above full text somewhere and provided some examples this
would make sense to people.

-- 
David Smith
dsm...@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



UTRACE_STOP in exec handler

2008-09-16 Thread David Smith
Roland,

I'm seeing a problem on with the new utrace on
2.6.27-0.323.rc6.fc10.x86_64.  Basically, if I attach a new engine in an
exec handler, and use utrace_control() with UTRACE_STOP, the task
doesn't reliably stop.

I was seeing this behaviour with systemtap, so I started with
crash-suspend.c and mangled it to do something similar, to hopefully
simplify the problem down a bit.

So, try compiling this module, then insmod it with the pid of a bash
process.  What I expected to happen is that every new process exec'ed by
that bash shell gets stopped once.  Instead, ls will run to completion
without stopping. If you run cat, then use Ctrl-C to kill it, you'll
get a quiesce event.

So, is this a problem with the way I'm attempting to stop the thread or
a utrace bug?

(Note that I've tried returning UTRACE_STOP from the exec handler
instead of calling utrace_control(), but it makes no difference.)

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
#include linux/sched.h
#include linux/pid.h
#include linux/utrace.h
#include linux/err.h
#include linux/module.h
#include linux/errno.h

MODULE_DESCRIPTION(automatic suspend on crash);
MODULE_LICENSE(GPL);

static int target_pid;
static int verbose;

module_param_named(pid, target_pid, int, 0);
module_param(verbose, bool, 0);

#define MY_EVENTS (UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXEC))

#define MY_EVENTS2 (UTRACE_EVENT(QUIESCE))

static u32
crash_suspend_quiesce(u32 action, struct utrace_attached_engine *engine,
		  struct task_struct *tsk, unsigned long event)
{

	if (tsk != NULL) {
		printk(pid %d quiesced\n, tsk-pid);
	}

	return UTRACE_DETACH;
}

static const struct utrace_engine_ops crash_suspend_ops2 =
{
	.report_quiesce = crash_suspend_quiesce,
};

/*
 * On clone, attach to the child.
 */
static u32
crash_suspend_clone(enum utrace_resume_action action,
		struct utrace_attached_engine *engine,
		struct task_struct *parent,
		unsigned long clone_flags,
		struct task_struct *child)
{
	struct utrace_attached_engine *child_engine;

	child_engine = utrace_attach_task(child, UTRACE_ATTACH_CREATE,
	  engine-ops, 0);
	if (IS_ERR(child_engine)) {
		printk(attach to clone child %d (%lx) from 0x%p = %ld\n,
		   child-pid, clone_flags, engine, PTR_ERR(child_engine));
	} else {
		int err = utrace_set_events(child, child_engine, MY_EVENTS);
		WARN_ON(err);

		utrace_engine_put(child_engine);
	}

	return UTRACE_RESUME;
}

static u32
crash_suspend_exec(enum utrace_resume_action action,
		   struct utrace_attached_engine *engine,
		   struct task_struct *tsk,
		   const struct linux_binfmt *fmt,
		   const struct linux_binprm *bprm,
		   struct pt_regs *regs)
{
	struct utrace_attached_engine *engine2;

	engine2 = utrace_attach_task(tsk, UTRACE_ATTACH_CREATE,
 crash_suspend_ops2, 0);
	if (IS_ERR(engine2)) {
		printk(attach to exec %d from 0x%p = %ld\n,
		   tsk-pid, engine2, PTR_ERR(engine2));
	} else {
		int err = utrace_set_events(tsk, engine2, MY_EVENTS2);
		WARN_ON(err);

		err = utrace_control(tsk, engine2, UTRACE_STOP);
		printk(utrace_control(%d) returned %d\n,
		   (int)tsk-pid, err);

		utrace_engine_put(engine2);
	}

	return UTRACE_RESUME;
}

/*
 * If we are still attached at task death, it didn't die by core dump signal.
 * Just detach and let it go.
 */
static u32
crash_suspend_death(struct utrace_attached_engine *engine,
		struct task_struct *tsk,
		bool group_dead, int signal)
{
	return UTRACE_DETACH;
}

static const struct utrace_engine_ops crash_suspend_ops =
{
	.report_clone = crash_suspend_clone,
	.report_exec = crash_suspend_exec,
};

static int __init init_crash_suspend(void)
{
	struct pid *pid;
	struct utrace_attached_engine *engine;
	int ret;

	pid = find_get_pid(target_pid);
	if (pid == NULL) {
		printk(cannot find PID %d\n, target_pid);
		return -ESRCH;
	}

	engine = utrace_attach_pid(pid, UTRACE_ATTACH_CREATE,
   crash_suspend_ops, 0);
	if (IS_ERR(engine))
		printk(utrace_attach: %ld\n, PTR_ERR(engine));
	else if (engine == NULL)
		printk(utrace_attach = null!\n);
	else
		printk(attached to %d = 0x%p\n, pid_vnr(pid), engine);

	ret = utrace_set_events_pid(pid, engine, MY_EVENTS);
	if (ret == -ESRCH)
		printk(pid %d died during setup\n, pid_vnr(pid));
	else
		WARN_ON(ret);

	put_pid(pid);
	if (engine  !IS_ERR(engine))
		utrace_engine_put(engine);

	return 0;
}

static void __exit exit_crash_suspend(void)
{
	struct task_struct *t;
	struct utrace_attached_engine *engine;
	int n = 0;
	int ret;

restart:
	rcu_read_lock();
	for_each_process(t) {
		engine = utrace_attach_task(t, UTRACE_ATTACH_MATCH_OPS,
	crash_suspend_ops, 0);
		if (IS_ERR(engine)) {
			int error = -PTR_ERR(engine);
			if (error != ENOENT)
printk(!!! utrace_attach returned %d on %d\n,
   error, t-pid);
			continue;
		}

		ret = utrace_control(t, engine, UTRACE_DETACH);
		if (ret == -EINPROGRESS) {
			/*
			 * It's running our callback, so we have

utrace_set_events in quiesce handler?

2008-09-02 Thread David Smith
In systemtap, we've changed to stopping a thread before setting up the
events we're interested in (besides quiesce/death).  So, basically, it
looks like this (without much error handling):

// initial attach logic
// ... find an interesting thread ...
ops.report_quiesce = quiesce_handler;
ops.report_syscall_entry = syscall_entry_handler;

engine = utrace_attach_task(tsk, UTRACE_ATTACH_CREATE, ops, data);
rc = utrace_set_events(tsk, engine,
(UTRACE_EVENT(DEATH) | UTRACE_STOP | UTRACE_EVENT(QUIESCE)));

// ... do other stuff ...


// quiesce handler
u32
quiesce_handler(enum utrace_resume_action action,
struct utrace_attached_engine *engine,
struct task_struct *tsk,
unsigned long event)
{
int rc;

// Turn off quiesce handling and turn on syscall handling
rc = utrace_set_events(tsk, engine,
UTRACE_EVENT(DEATH) | UTRACE_EVENT(SYSCALL_ENTRY));
if (rc == -EINPROGRESS) {
rc = utrace_barrier(tsk, engine);
if (rc != 0)
printk(KERN_ERR
utrace_barrier returned error %d on pid %d,
rc, (int)tsk-pid);
rc = utrace_set_events(tsk, engine,
UTRACE_EVENT(DEATH) | UTRACE_EVENT(SYSCALL_ENTRY));
}
if (rc != 0)
printk(KERN_ERR
utrace_set_events returned error %d on pid %d,
rc, (int)tsk-pid);

// ... do other stuff ...
return UTRACE_RESUME;
}

The utrace_barrier() call always returns 0, but the utrace_set_events()
calls always return -EINPRPOGRESS.  I've put the -EINPROGRESS logic in a
loop, but even after 10 iterations utrace_set_events() never succeeds.

This is on 2.6.27-0.287.rc4.git7.fc10.x86_64.

So, am I doing this incorrectly?  Or is there a bug here in utrace
(where it doesn't expect to see a utrace_set_events() from within a
handler on the same thread)?  If I'm doing this incorrectly, I'd like
help in figuring out what I should be doing.

(In the original utrace there was UTRACE_ACTION_NEWSTATE which allowed
you to change the flags from the handler, but I haven't seen anything
similar in the current utrace.)

Thanks for the help.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



Re: global tracing

2008-08-04 Thread David Smith
Roland McGrath wrote:
 We've mentioned global tracing.  I think it's time now to discuss it
 thoroughly and decide what we do or don't want to do.

...

 2. Why do we want utrace global tracing?

From a systemtap point of view, we'd certainly use global tracing.

...

 3. What would it look like?
 
 Global engines' callbacks all run after all per-task engine callbacks.
 (This could change in future.)

I guess in a perfect world callbacks would still be called in the
order they were attached.  But, if calling the global callbacks last
makes things easier, I think systemtap could handle it.

 I had originally planned to rule out SYSCALL events for global tracing.
 The reason is that this is not like other event checks where a simple
 flag gets checked cheaply.  Instead, it requires setting the low-level
 TIF_SYSCALL_TRACE on a thread, which makes it take a far slower path on
 system call entry and exit, and has a big impact on performance just
 from that alone.  Global tracing has to set this individually on every
 thread, and then pay that big overhead across the board.

If we had utrace memory map tracing (I believe it is on your TODO list),
 systemtap wouldn't use global (or even per-thread) SYSCALL events as much.

...

 I'd kind of prefer to exclude REAP events for global tracing.

Currently systemtap only uses DEATH events, so I don't have much of an
opinion there.

...

 4. So, what's the plan?
 
 I need folks who might use global tracing to answer these questions:
 
a. Do we want it?

Yes.  Systemtap currently does global tracing now, in a manner similar
to crash-suspend.c.  The code looks for global CLONE, EXEC, and DEATH
events, so systemtap knows when threads come and go.  Once systemtap
finds a process the user has told us he's interested in, it attaches
some additional per-thread engine(s).

In the future, Frank has mentioned trying to do global memory map
tracing, which would require global syscall tracing (or future global
memory map tracing).

b. Do we want it right now?

Yes.  If you need beta testers, let me know.

c. What justifies doing it in utrace (vs leaving it purely to
   tracepoints et al), to placate upstream critics?

 Please don't say, That would be nice; your reasons sound good.
 That just does not help at all.  The reasons in #2 above are ones I can
 think of, but I'm not arguing for them or for the feature.  If you want
 the feature, *you* will be justifying it to the upstream critics.  Let's
 here be as skeptical about adding the new complexity, before we decide on
 doing it, as our unsympathetic reviewers will be.

Global tracing would be *really* nice; your reasons sound *great*.
How's that? :-)

Seriously, your reasons a. (Event vocabulary clearly aligned with
utrace events), b. (Coordinated with per-task utrace callbacks), and
d. (Kernel already has checks here, so almost free) apply most
clearly to systemtap.  Systemtap doesn't currently change outcomes in a
callback, so reason c. doesn't apply much.  Systemtap is interested in
performance impacts and the a./b. advantages seem quite obvious to me.
Avoiding the complexities of manually attaching/detaching to every
thread in the system seems important also.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



Re: crash-suspend teardown races

2008-07-31 Thread David Smith
Roland McGrath wrote:
 Thanks, David.  That is exactly the right example of using kernel
 synchronization primitives with callbacks to implement blocking behaviors
 you want.  The wrinkle there is that you use UTRACE_INTERRUPT, which
 (potentially) perturbs the behavior of every traced thread.  Doing this
 gives you a simple a way to do synchronous detach and avoid those races.
 It's a prime example of why asynchronous detach is harder and we need to
 hash it out.  What you've done is the only thing that's straightforward to
 do now, but it has one of the bad old side effects of ptrace (interrupting
 detach) that we need to eliminate to make the facility acceptable as the
 basis for pervasive tracing of many processes on the system.

Is there a way to avoid using UTRACE_INTERRUPT?  Certainly I'd like to
avoid disturbing the processes we're tracing.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



crash-suspend teardown races

2008-07-30 Thread David Smith
For background to this email, read the teardown races section in
utrace.txt and Roland's asynchronous detach email.

The crash-suspend.c example suffers from the teardown races problem.
(Since systemtap's utrace code is a more elaborate version of
crash-suspend.c, systemtap has the same problem.)  So, I've attempted to
come up with a solution that doesn't pervert the example too much.

I've attached a patch with the details.  Basically, at module unload
time, instead of detaching directly, it tries to asynchronously stop all
the threads we're attached to and then let the quiesce handler detach
from the thread.  The semi-tricky part was letting the module unload
function, exit_crash_suspend(), know when all threads that we had
attached to were detached.  To do this, the code keeps up with an attach
count.  When the attach count reaches 0, the module unload function gets
woken up to go ahead and exit.

I'd appreciate any thoughts, criticisms, etc. on this patch, which I've
tested under kernel 2.6.27-0.186.rc0.git15.fc10.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
--- /home/dsmith/crash-suspend.c2008-07-30 15:36:46.0 -0500
+++ crash-suspend2.c2008-07-30 16:01:58.0 -0500
@@ -18,6 +18,14 @@ module_param(verbose, bool, 0);
   UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(JCTL) | \
   UTRACE_EVENT(QUIESCE))
 
+#define SHUTDOWN_EVENTS (UTRACE_EVENT(QUIESCE) | UTRACE_EVENT(DEATH))
+#define CS_STARTING0
+#define CS_STOPPING1
+atomic_t state = ATOMIC_INIT(CS_STARTING);
+atomic_t attach_count = ATOMIC_INIT (0);
+
+static DECLARE_WAIT_QUEUE_HEAD(crash_suspend_wq);
+
 /*
  * This is the interesting hook.
  */
@@ -61,6 +69,13 @@ crash_suspend_quiesce(u32 action, struct
 */
if (!event)
engine-data = NULL;
+
+   if (atomic_read(state) == CS_STOPPING) {
+   if (atomic_dec_return(attach_count) = 0) {
+   wake_up(crash_suspend_wq);
+   }
+   return UTRACE_DETACH;
+   }
return UTRACE_RESUME;
 }
 
@@ -85,7 +100,7 @@ crash_suspend_jctl(enum utrace_resume_ac
 * proper weirdo status, stop the rest of the
 * process group too in normal job control fashion.
 */
-   (void) kill_pgrp(find_pid(-pgid), SIGTTOU, 1);
+   (void) kill_pgrp(find_vpid(-pgid), SIGTTOU, 1);
} else if (engine-data) {
/*
 * We've been resumed after a crash.
@@ -117,6 +132,7 @@ crash_suspend_clone(enum utrace_resume_a
} else {
int err = utrace_set_events(child, child_engine, MY_EVENTS);
WARN_ON(err);
+   atomic_inc(attach_count);
}
 
return UTRACE_RESUME;
@@ -131,13 +147,21 @@ crash_suspend_death(struct utrace_attach
struct task_struct *tsk,
bool group_dead, int signal)
 {
+   if (atomic_read(state) == CS_STOPPING) {
+   if (atomic_dec_return(attach_count) = 0) {
+   wake_up(crash_suspend_wq);
+   }
+   }
+   else {
+   atomic_dec(attach_count);
+   }
return UTRACE_DETACH;
 }
 
-
 static const struct utrace_engine_ops crash_suspend_ops =
 {
.report_clone = crash_suspend_clone,
+   .report_quiesce = crash_suspend_quiesce,
.report_death = crash_suspend_death,
.report_signal = crash_suspend_signal,
.report_jctl = crash_suspend_jctl,
@@ -151,7 +175,7 @@ static int __init init_crash_suspend(voi
int ret;
 
rcu_read_lock();
-   target = find_task_by_pid(target_pid);
+   target = find_task_by_vpid(target_pid);
if (target)
get_task_struct(target);
rcu_read_unlock();
@@ -173,8 +197,10 @@ static int __init init_crash_suspend(voi
ret = utrace_set_events(target, engine, MY_EVENTS);
if (ret == -ESRCH)
printk(pid %d died during setup\n, target-pid);
-   else
+   else {
+   atomic_inc(attach_count);
WARN_ON(ret);
+   }
 
WARN_ON(atomic_dec_and_test(target-usage));
return 0;
@@ -186,6 +212,7 @@ static void __exit exit_crash_suspend(vo
struct utrace_attached_engine *engine;
int n = 0;
 
+   atomic_set(state, CS_STOPPING);
rcu_read_lock();
for_each_process(t) {
engine = utrace_attach(t, UTRACE_ATTACH_MATCH_OPS,
@@ -197,14 +224,33 @@ static void __exit exit_crash_suspend(vo
   error, t-pid);
}
else {
-   int ret = utrace_control(t, engine, UTRACE_DETACH);
+   int ret = utrace_set_events(t, engine,
+   SHUTDOWN_EVENTS

Re: utrace doc update, nearing submission(?)

2008-07-28 Thread David Smith
Roland McGrath wrote:
 I've read through most of the docs (although my eyes certainly glazed
 over when I got to the tracehook stuff).
 
 That stuff is for kernel maintainers, as it says.  You don't really need to
 know.  For that part, just checking for typos is all the help it needs.
 
 What's there seems to be pretty good.  I think it is missing (or at
 least I missed it) a good description of how to asynchronously stop a
 thread.  What are the ins-and-outs here, knowing how/when to use
 UTRACE_STOP vs. UTRACE_INTERRUPT, etc.
 
 Yeah, the kerneldoc-driven format doesn't lend itself to a lot of
 exposition.  The details are in there in the utrace_control description.
 But I guess it's not real obvious how to put it all together.
 
 How is the Stopping Safely section?  That seems like the place to add
 something more direct about this.  Was there something other than the
 UTRACE_INTERRUPT issue that didn't seem clear?
 
 Is that not clear to you, or is it just not clear in the documentation?

Probably both - see below.

 You use UTRACE_INTERRUPT when you want to interfere with system calls in
 progress (or blocking page faults or whatever).  To interrupt and stop
 (like PTRACE_ATTACH does) and handle it already being stopped, you really
 want to do UTRACE_STOP first and see 0 if it's already stopped.  If it's
 not already stopped, you get -EINPROGRESS and then can do UTRACE_INTERRUPT
 to be sure that you interrupt it and that the race complexity of it being
 stopped before or waking up is minimized.  (If you just do UTRACE_INTERRUPT
 first, then you'll get a callback soon--unless it's stopped.  Then you
 won't, but if you do UTRACE_STOP second to see if it's stopped, then you
 could get the callback in between and your life gets more hairy.)

Ah.

That paragraph makes lots of sense - I wish it could be included
somewhere.  So, basically, you do something like this:

rc = utrace_control(t, engine, UTRACE_STOP);
if (rc == -EINPROGRESS) {
rc = utrace_control(t, engine, UTRACE_INTERRUPT);
}

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)



crash-suspend.c using new interface

2008-07-23 Thread David Smith
To test out the new utrace interface (I was using kernel
2.6.26-138.fc10.x86_64), I ported the crash-suspend.c example from the
old interface to the new interface.  It is attached to this email.  Feel
free to ignore/delete the extra debug printk's.

When run, it does what is is supposed to do, but triggers a BUG_ON
message when the crashed/suspended process is put back in the foreground.

Note that I don't have any real feel for whether the bug lies in my
crash-suspend.c translation or the new utrace itself.

The BUG details are in msg.txt.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
#include linux/sched.h
#include linux/pid.h
#include linux/utrace.h
#include linux/err.h
#include linux/module.h
#include linux/errno.h

MODULE_DESCRIPTION(automatic suspend on crash);
MODULE_LICENSE(GPL);

static int target_pid;
static int verbose;

module_param_named(pid, target_pid, int, 0);
module_param(verbose, bool, 0);

#define MY_EVENTS (UTRACE_EVENT(CLONE) | UTRACE_EVENT(DEATH) \
		   | UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(JCTL))


/*
 * This is the interesting hook.
 */
static u32
crash_suspend_signal(u32 action,
		 struct utrace_attached_engine *engine,
		 struct task_struct *tsk, struct pt_regs *regs,
		 siginfo_t *info,
		 const struct k_sigaction *orig_ka,
		 struct k_sigaction *return_ka)
{
printk(%s:%d action = 0x%x\n, __FUNCTION__, __LINE__, action);
	if (info-si_errno  0x8000) {
		info-si_errno = ~0x8000;
printk(%s:%d\n, __FUNCTION__, __LINE__);
		return UTRACE_RESUME;
	}

	/*
	 * If another engine is doing something, just get out of the way.
	 */
	if ((action  UTRACE_SIGNAL_MASK) != UTRACE_SIGNAL_CORE)
		return UTRACE_RESUME;

printk(%s:%d\n, __FUNCTION__, __LINE__);
	info-si_errno |= 0x8000;
	return UTRACE_SIGNAL_TSTP | UTRACE_SIGNAL_HOLD;
}

static u32
crash_suspend_jctl(enum utrace_resume_action action,
		   struct utrace_attached_engine *engine,
		   struct task_struct *tsk, 
		   bool notify, int type)
{
printk(%s:%d type = 0x%x\n, __FUNCTION__, __LINE__, type);
	if (type == CLD_STOPPED) {
		int signr = tsk-exit_code;
		pid_t pgid = task_pgrp_nr(tsk);
		if (verbose)
			printk(crash-suspend stopped
			pgrp %d for pid %d signal %d\n,
			   pgid, tsk-pid, signr);
		if (signr != SIGSTOP  signr != SIGTSTP
		 signr != SIGTTOU  signr != SIGTTIN)
			/*
			 * This is an unnatural stop induced by us, above.
			 * Now that we have ourselves stopped with the
			 * proper weirdo status, stop the rest of the
			 * process group too in normal job control fashion.
			 */
			(void) kill_pgrp(find_pid(-pgid), SIGTTOU, 1);
	}
	return UTRACE_RESUME;
}


/*
 * On clone, attach to the child.
 */
static u32
crash_suspend_clone(enum utrace_resume_action action,
		struct utrace_attached_engine *engine,
		struct task_struct *parent,
		unsigned long clone_flags,
		struct task_struct *child)
{
	struct utrace_attached_engine *child_engine;

printk(%s:%d\n, __FUNCTION__, __LINE__);
	child_engine = utrace_attach(child, UTRACE_ATTACH_CREATE,
engine-ops, 0);
	if (IS_ERR(child_engine)) {
		printk(attach to clone child %d (%lx) from 0x%p = %ld\n,
		   child-pid, clone_flags, engine, PTR_ERR(child_engine));
	}
	else
		utrace_set_events(child, child_engine, MY_EVENTS);

	return UTRACE_RESUME;
}

/*
 * If we are still attached at task death, it didn't die by core dump signal.
 * Just detach and let it go.
 */
static u32
crash_suspend_death(struct utrace_attached_engine *engine,
		struct task_struct *tsk,
		bool group_dead, int signal)
{
printk(%s:%d\n, __FUNCTION__, __LINE__);
	return UTRACE_DETACH;
}


static const struct utrace_engine_ops crash_suspend_ops =
{
	.report_clone = crash_suspend_clone,
	.report_death = crash_suspend_death,
	.report_signal = crash_suspend_signal,
	.report_jctl = crash_suspend_jctl,
};

static int __init init_crash_suspend(void)
{
	struct task_struct *target;
	struct utrace_attached_engine *engine;

	rcu_read_lock();
	target = find_task_by_pid(target_pid);
	if (target)
		get_task_struct(target);
	rcu_read_unlock();

	if (target == NULL) {
		printk(cannot find PID %d\n, target_pid);
		return -ESRCH;
	}

	engine = utrace_attach(target, UTRACE_ATTACH_CREATE,
			  crash_suspend_ops, 0);
	if (IS_ERR(engine))
		printk(utrace_attach: %ld\n, PTR_ERR(engine));
	else if (engine == NULL)
		printk(utrace_attach = null!\n);
	else
		printk(attached to %d = 0x%p\n, target-pid, engine);

	utrace_set_events(target, engine, MY_EVENTS);

	WARN_ON(atomic_dec_and_test(target-usage));
	return 0;
}

static void __exit exit_crash_suspend(void)
{
	struct task_struct *t;
	struct utrace_attached_engine *engine;
	int n = 0;

	rcu_read_lock();
	for_each_process(t) {
		engine = utrace_attach(t, UTRACE_ATTACH_MATCH_OPS,
   crash_suspend_ops, 0);
		if (IS_ERR(engine)) {
			int error = -PTR_ERR(engine);
			if (error != ENOENT)
printk(!!! utrace_attach returned %d on %d\n

Re: Tracing Syscalls under Fedora 9

2008-06-09 Thread David Smith
Martin Süßkraut wrote:
 Hi,
 
 has the tracing of system calls changed in utrace between Fedora 8 and 9?
 
 My module works fine under Fedora 8, but under Fedora 9 the callbacks
 report_syscall_entry and report_syscall_exit seam not to be invoked
 any more.

If it helps at all, systemtap is seeing the same thing - system call
callbacks that got called under F8 aren't getting called under F9.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)




need quiesce help

2008-06-09 Thread David Smith
In systemtap (http://sourceware.org/systemtap/), we're using utrace to
be able to put systemtap probes in arbitrary threads.  Internally, there
are two fairly separate and distinct layers, each using utrace on the
same thread.

This has worked well.  However, I'm having trouble when each layer
requests UTRACE_ACTION_QUIESCE.

The upper layer does:

utrace_set_flags(tsk, engine, UTRACE_ACTION_QUIESCE|UTRACE_EVENT(QUIESCE));

Then, in the .report_quiesce handler, it does its processing and returns
UTRACE_ACTION_DETACH.  The upper layer is working correctly.

The lower layer does:

utrace_set_flags(tsk, engine,
UTRACE_ACTION_QUIESCE|UTRACE_EVENT(QUIESCE)|UTRACE_EVENT(DEATH));

After the lower layer's .report_quiesce handler is called, I'd like to
turn off quiesce handling for this engine and leave on
UTRACE_EVENT(DEATH) handling.  I've tried various combinations of return
values and calling utrace_set_flags() in the .report_quiesce handler,
but this handler always seems to get called twice (when the upper layer
also does a UTRACE_ACTION_QUIESCE).

So, what does the lower layer need to do to correctly turn off quiesce
handling for the engine in this case?  Note that both layer's engines
get installed on the thread in the same callback.

Thanks for the help.

-- 
David Smith
[EMAIL PROTECTED]
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)