Re: Task watchers v2

2006-12-19 Thread Paul Jackson
Matt wrote:
> Previous iterations of task watchers would prevent the code in these
> paths from being inlined. Furthermore, the code certainly wouldn't be
> placed near the table of function pointers (which was in an entirely
> different ELF section). By placing them adjacent to each other in the
> same ELF section we can improve the likelihood of cache hits in
> fork-heavy workloads (which were the ones that showed a performance
> decrease in the previous iteration of these patches).

Ah so - by marking some of the fork (and exit, exec, ...) routines
with the WATCH_TASK_* mechanism, you can compact them together in the
kernel's text pages, instead of having them scattered about based on
whatever source files they are in.

Nice.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-19 Thread Matt Helsley
On Mon, 2006-12-18 at 21:41 -0800, Paul Jackson wrote:
> Matt wrote:
> > - Task watchers can actually improve kernel performance slightly (up to
> > 2% in extremely fork-heavy workloads for instance).
> 
> Nice.
> 
> Could you explain why?

After the last round of patches I set out to improve instruction and
data cache hits.

Previous iterations of task watchers would prevent the code in these
paths from being inlined. Furthermore, the code certainly wouldn't be
placed near the table of function pointers (which was in an entirely
different ELF section). By placing them adjacent to each other in the
same ELF section we can improve the likelihood of cache hits in
fork-heavy workloads (which were the ones that showed a performance
decrease in the previous iteration of these patches).

Suppose we have two functions to invoke during fork -- A and B. Here's
what the memory layout looked like in the previous iteration of task
watchers:

+--+<+
|  insns of B  | |
   . |
   . |
   . |
|  | |
+--+ |
   . |
   . |
   . |
+--+<-+  |
|  insns of A  |  |  |
   .  |  |
   .  |  |
   .  |  |
|  |  |  |
+--+  |  |
   .  |  |
   .  |  |
   .  |  |  .text
==|==|= ELF Section Boundary
+--+  |  |  .task
| pointer to A+  |
+--+ |
| pointer to B---+
+--+

The notify_task_watchers() function would first load the pointer to A from the 
.task
section. Then it would immediately jump into the .text section and force the
instructions from A to be loaded. When A was finished, it would return to
notify_task_watchers() only to jump into B by the same steps.

As you can see things can be rather spread out. Unless the compiler inlined the
functions called from copy_process() things are very similar in a mainline
kernel -- copy_process() could be jumping to rather distant portions of the 
kernel
text and the pointer table would be rather distant from the instructions to be 
loaded.

Here's what the new patches look like:

===
+--+.task
| pointer to A+
+--+  |
| pointer to B---+
+--+  |  |
   .  |  |
   .  |  |
+--+<-+  |
|  insns of A  | |
   . |
   . |
   . |
|  | |
+--+<+
|  insns of B  |
   .
   .
   .
|  |
+--+
===

Which is clearly more compact and also follows the order of calls (A
then B). The instructions are all in the same section. When A finishes
executing we soon jump into B which could be in the same instruction
cache line as the function we just left. Furthermore, since the sequence
always goes from A to B I expect some anticipatory loads could be done.

For fork-heavy workloads I'd expect this to explain the performance
difference. For workloads that aren't fork-heavy I suspect we're just as
likely to experience instruction cache misses -- whether the functions
are inlined, adjacent, or not -- since the fork happens relatively
infrequently and other instructions are likely to push fork-related
instructions out.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-19 Thread Matt Helsley
On Mon, 2006-12-18 at 21:41 -0800, Paul Jackson wrote:
 Matt wrote:
  - Task watchers can actually improve kernel performance slightly (up to
  2% in extremely fork-heavy workloads for instance).
 
 Nice.
 
 Could you explain why?

After the last round of patches I set out to improve instruction and
data cache hits.

Previous iterations of task watchers would prevent the code in these
paths from being inlined. Furthermore, the code certainly wouldn't be
placed near the table of function pointers (which was in an entirely
different ELF section). By placing them adjacent to each other in the
same ELF section we can improve the likelihood of cache hits in
fork-heavy workloads (which were the ones that showed a performance
decrease in the previous iteration of these patches).

Suppose we have two functions to invoke during fork -- A and B. Here's
what the memory layout looked like in the previous iteration of task
watchers:

+--++
|  insns of B  | |
   . |
   . |
   . |
|  | |
+--+ |
   . |
   . |
   . |
+--+-+  |
|  insns of A  |  |  |
   .  |  |
   .  |  |
   .  |  |
|  |  |  |
+--+  |  |
   .  |  |
   .  |  |
   .  |  |  .text
==|==|= ELF Section Boundary
+--+  |  |  .task
| pointer to A+  |
+--+ |
| pointer to B---+
+--+

The notify_task_watchers() function would first load the pointer to A from the 
.task
section. Then it would immediately jump into the .text section and force the
instructions from A to be loaded. When A was finished, it would return to
notify_task_watchers() only to jump into B by the same steps.

As you can see things can be rather spread out. Unless the compiler inlined the
functions called from copy_process() things are very similar in a mainline
kernel -- copy_process() could be jumping to rather distant portions of the 
kernel
text and the pointer table would be rather distant from the instructions to be 
loaded.

Here's what the new patches look like:

===
+--+.task
| pointer to A+
+--+  |
| pointer to B---+
+--+  |  |
   .  |  |
   .  |  |
+--+-+  |
|  insns of A  | |
   . |
   . |
   . |
|  | |
+--++
|  insns of B  |
   .
   .
   .
|  |
+--+
===

Which is clearly more compact and also follows the order of calls (A
then B). The instructions are all in the same section. When A finishes
executing we soon jump into B which could be in the same instruction
cache line as the function we just left. Furthermore, since the sequence
always goes from A to B I expect some anticipatory loads could be done.

For fork-heavy workloads I'd expect this to explain the performance
difference. For workloads that aren't fork-heavy I suspect we're just as
likely to experience instruction cache misses -- whether the functions
are inlined, adjacent, or not -- since the fork happens relatively
infrequently and other instructions are likely to push fork-related
instructions out.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-19 Thread Paul Jackson
Matt wrote:
 Previous iterations of task watchers would prevent the code in these
 paths from being inlined. Furthermore, the code certainly wouldn't be
 placed near the table of function pointers (which was in an entirely
 different ELF section). By placing them adjacent to each other in the
 same ELF section we can improve the likelihood of cache hits in
 fork-heavy workloads (which were the ones that showed a performance
 decrease in the previous iteration of these patches).

Ah so - by marking some of the fork (and exit, exec, ...) routines
with the WATCH_TASK_* mechanism, you can compact them together in the
kernel's text pages, instead of having them scattered about based on
whatever source files they are in.

Nice.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-18 Thread Paul Jackson
Matt wrote:
> - Task watchers can actually improve kernel performance slightly (up to
> 2% in extremely fork-heavy workloads for instance).

Nice.

Could you explain why?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-18 Thread Matt Helsley
On Mon, 2006-12-18 at 13:44 +0800, Zhang, Yanmin wrote:
> On Thu, 2006-12-14 at 16:07 -0800, Matt Helsley wrote:
> > plain text document attachment (task-watchers-v2)
> > Associate function calls with significant events in a task's lifetime much 
> > like
> > we handle kernel and module init/exit functions. This creates a table for 
> > each
> > of the following events in the task_watchers_table ELF section:
> >
> > WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> > new task struct first becomes available.
> >
> > WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> >
> > WATCH_TASK_EXEC just before successfully returning from the exec
> > system call.
> >
> > WATCH_TASK_UID every time a task's real or effective user id changes.
> >
> > WATCH_TASK_GID every time a task's real or effective group id changes.
> >
> > WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> > for any reason.
> >
> > WATCH_TASK_FREE is called before critical task structures like
> > the mm_struct become inaccessible and the task is subsequently freed.
> >
> > The next patch will add a debugfs interface for measuring fork and exit 
> > rates
> > which can be used to calculate the overhead of the task watcher 
> > infrastructure.
> >
> > Subsequent patches will make use of task watchers to simplify fork, exit,
> > and many of the system calls that set [er][ug]ids.
> It's easier to get such watch capabilities by kprobe/systemtap. Why to
> add new codes to kernel?

Good question! Disclaimer: Everything I know about kprobes I learned
from Documentation/kprobes.txt

The task watchers patches have a few distinguishing capabilities yet
lack capabilities important for kprobes -- so neither is a replacement
for the other. Specifically:

- Task watchers are for use by the kernel for more than profiling and
debugging. They need to work even when kernel debugging and
instrumentation are disabled.

- Task watchers do not need to be dynamically enabled, disabled, or
removed (though dynamic insertion would be nice -- I'm working on that).
In fact I've been told that dynamically enabling, disabling, or removing
them would incur unacceptable complexity and/or cost for an
uninstrumented kernel.

- Task watchers don't require arch support. They use completely generic
code.
- Since they are written into the code task watchers don't need
  to modify instructions.

- Task watchers doesn't need to single-step an instruction

- Task watchers don't need to know about arch registers, calling
  conventions, etc. to work

- Task watchers don't need to have the same (possibly extensive)
argument list as the function being "probed". This makes maintenance
easier -- no need to keep the signature of the watchers in synch with
the signature of the "probed" function.

- Task watchers don't require MODULES (2.6.20-rc1-mm1's
arch/i386/Kconfig suggests this is true of kprobes).

- Task watchers don't need kernel symbols.

- Task watchers can affect flow control (see the patch hunks that change
copy_process()) with their return value.

- Task watchers do not need to know the instruction address to be
"probed".

- Task watchers can actually improve kernel performance slightly (up to
2% in extremely fork-heavy workloads for instance).

- Task watchers require local variables -- not necessarily arguments to
the "probed" function.

- Task watchers don't care if preemption is enabled or disabled.

- Task watchers could sleep if they want to.

So to the best of my knowledge kprobes isn't a replacement for task
watchers nor is task watchers capable of replacing kprobes.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-18 Thread Matt Helsley
On Mon, 2006-12-18 at 13:44 +0800, Zhang, Yanmin wrote:
 On Thu, 2006-12-14 at 16:07 -0800, Matt Helsley wrote:
  plain text document attachment (task-watchers-v2)
  Associate function calls with significant events in a task's lifetime much 
  like
  we handle kernel and module init/exit functions. This creates a table for 
  each
  of the following events in the task_watchers_table ELF section:
 
  WATCH_TASK_INIT at the beginning of a fork/clone system call when the
  new task struct first becomes available.
 
  WATCH_TASK_CLONE just before returning successfully from a fork/clone.
 
  WATCH_TASK_EXEC just before successfully returning from the exec
  system call.
 
  WATCH_TASK_UID every time a task's real or effective user id changes.
 
  WATCH_TASK_GID every time a task's real or effective group id changes.
 
  WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
  for any reason.
 
  WATCH_TASK_FREE is called before critical task structures like
  the mm_struct become inaccessible and the task is subsequently freed.
 
  The next patch will add a debugfs interface for measuring fork and exit 
  rates
  which can be used to calculate the overhead of the task watcher 
  infrastructure.
 
  Subsequent patches will make use of task watchers to simplify fork, exit,
  and many of the system calls that set [er][ug]ids.
 It's easier to get such watch capabilities by kprobe/systemtap. Why to
 add new codes to kernel?

Good question! Disclaimer: Everything I know about kprobes I learned
from Documentation/kprobes.txt

The task watchers patches have a few distinguishing capabilities yet
lack capabilities important for kprobes -- so neither is a replacement
for the other. Specifically:

- Task watchers are for use by the kernel for more than profiling and
debugging. They need to work even when kernel debugging and
instrumentation are disabled.

- Task watchers do not need to be dynamically enabled, disabled, or
removed (though dynamic insertion would be nice -- I'm working on that).
In fact I've been told that dynamically enabling, disabling, or removing
them would incur unacceptable complexity and/or cost for an
uninstrumented kernel.

- Task watchers don't require arch support. They use completely generic
code.
- Since they are written into the code task watchers don't need
  to modify instructions.

- Task watchers doesn't need to single-step an instruction

- Task watchers don't need to know about arch registers, calling
  conventions, etc. to work

- Task watchers don't need to have the same (possibly extensive)
argument list as the function being probed. This makes maintenance
easier -- no need to keep the signature of the watchers in synch with
the signature of the probed function.

- Task watchers don't require MODULES (2.6.20-rc1-mm1's
arch/i386/Kconfig suggests this is true of kprobes).

- Task watchers don't need kernel symbols.

- Task watchers can affect flow control (see the patch hunks that change
copy_process()) with their return value.

- Task watchers do not need to know the instruction address to be
probed.

- Task watchers can actually improve kernel performance slightly (up to
2% in extremely fork-heavy workloads for instance).

- Task watchers require local variables -- not necessarily arguments to
the probed function.

- Task watchers don't care if preemption is enabled or disabled.

- Task watchers could sleep if they want to.

So to the best of my knowledge kprobes isn't a replacement for task
watchers nor is task watchers capable of replacing kprobes.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-18 Thread Paul Jackson
Matt wrote:
 - Task watchers can actually improve kernel performance slightly (up to
 2% in extremely fork-heavy workloads for instance).

Nice.

Could you explain why?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-17 Thread Zhang, Yanmin
On Thu, 2006-12-14 at 16:07 -0800, Matt Helsley wrote:
> plain text document attachment (task-watchers-v2)
> Associate function calls with significant events in a task's lifetime much 
> like
> we handle kernel and module init/exit functions. This creates a table for each
> of the following events in the task_watchers_table ELF section:
> 
> WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> new task struct first becomes available.
> 
> WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> 
> WATCH_TASK_EXEC just before successfully returning from the exec
> system call.
> 
> WATCH_TASK_UID every time a task's real or effective user id changes.
> 
> WATCH_TASK_GID every time a task's real or effective group id changes.
> 
> WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> for any reason. 
> 
> WATCH_TASK_FREE is called before critical task structures like
> the mm_struct become inaccessible and the task is subsequently freed.
> 
> The next patch will add a debugfs interface for measuring fork and exit rates
> which can be used to calculate the overhead of the task watcher 
> infrastructure.
> 
> Subsequent patches will make use of task watchers to simplify fork, exit,
> and many of the system calls that set [er][ug]ids.
It's easier to get such watch capabilities by kprobe/systemtap. Why to
add new codes to kernel?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-17 Thread Zhang, Yanmin
On Thu, 2006-12-14 at 16:07 -0800, Matt Helsley wrote:
 plain text document attachment (task-watchers-v2)
 Associate function calls with significant events in a task's lifetime much 
 like
 we handle kernel and module init/exit functions. This creates a table for each
 of the following events in the task_watchers_table ELF section:
 
 WATCH_TASK_INIT at the beginning of a fork/clone system call when the
 new task struct first becomes available.
 
 WATCH_TASK_CLONE just before returning successfully from a fork/clone.
 
 WATCH_TASK_EXEC just before successfully returning from the exec
 system call.
 
 WATCH_TASK_UID every time a task's real or effective user id changes.
 
 WATCH_TASK_GID every time a task's real or effective group id changes.
 
 WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
 for any reason. 
 
 WATCH_TASK_FREE is called before critical task structures like
 the mm_struct become inaccessible and the task is subsequently freed.
 
 The next patch will add a debugfs interface for measuring fork and exit rates
 which can be used to calculate the overhead of the task watcher 
 infrastructure.
 
 Subsequent patches will make use of task watchers to simplify fork, exit,
 and many of the system calls that set [er][ug]ids.
It's easier to get such watch capabilities by kprobe/systemtap. Why to
add new codes to kernel?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-15 Thread Matt Helsley
On Fri, 2006-12-15 at 09:34 +0100, Christoph Hellwig wrote:
> On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
> > Associate function calls with significant events in a task's lifetime much 
> > like
> > we handle kernel and module init/exit functions. This creates a table for 
> > each
> > of the following events in the task_watchers_table ELF section:
> >
> > WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> > new task struct first becomes available.
> >
> > WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> >
> > WATCH_TASK_EXEC just before successfully returning from the exec
> > system call.
> >
> > WATCH_TASK_UID every time a task's real or effective user id changes.
> >
> > WATCH_TASK_GID every time a task's real or effective group id changes.
> >
> > WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> > for any reason.
> >
> > WATCH_TASK_FREE is called before critical task structures like
> > the mm_struct become inaccessible and the task is subsequently freed.
> >
> > The next patch will add a debugfs interface for measuring fork and exit 
> > rates
> > which can be used to calculate the overhead of the task watcher 
> > infrastructure.
> 
> What's the point of the ELF hackery? This code would be a lot simpler
> and more understandable if you simply had task_watcher_ops and a
> register / unregister function for it.

A bit more verbose response:

I posted a notifier chain implementation back in June that bears some
resemblance to your suggestion -- a structure needed to be registered at
runtime. There was a single global list of them to iterate over for each
event.

This patch and the following patches are significantly shorter than
their counterparts in that series. They avoid iterating over elements
with empty ops. The way function pointers and function bodies are
grouped together by this series should improve locality. The fact that
there's no locking required also makes it simpler to analyze and use.

The patches to allow modules to register task watchers does make things
more complex though -- that does require a list and a lock. However, the
lock does not need to be taken in the fork/exec/etc paths if we pin the
module. In contrast your suggested approach is simpler because it
doesn't treat modules any differently. However overall I think the
balance still favors these patches.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-15 Thread Matt Helsley
On Fri, 2006-12-15 at 09:34 +0100, Christoph Hellwig wrote:
> On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
> > Associate function calls with significant events in a task's lifetime much 
> > like
> > we handle kernel and module init/exit functions. This creates a table for 
> > each
> > of the following events in the task_watchers_table ELF section:
> >
> > WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> > new task struct first becomes available.
> >
> > WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> >
> > WATCH_TASK_EXEC just before successfully returning from the exec
> > system call.
> >
> > WATCH_TASK_UID every time a task's real or effective user id changes.
> >
> > WATCH_TASK_GID every time a task's real or effective group id changes.
> >
> > WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> > for any reason.
> >
> > WATCH_TASK_FREE is called before critical task structures like
> > the mm_struct become inaccessible and the task is subsequently freed.
> >
> > The next patch will add a debugfs interface for measuring fork and exit 
> > rates
> > which can be used to calculate the overhead of the task watcher 
> > infrastructure.
> 
> What's the point of the ELF hackery? This code would be a lot simpler
> and more understandable if you simply had task_watcher_ops and a
> register / unregister function for it.

Andrew asked me to avoid locking and added complexity in the code that
uses one or more task watchers. The ELF hackery helps me avoid locking
in the fork/exit/etc paths that call the "registered" function.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-15 Thread Christoph Hellwig
On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
> Associate function calls with significant events in a task's lifetime much 
> like
> we handle kernel and module init/exit functions. This creates a table for each
> of the following events in the task_watchers_table ELF section:
> 
> WATCH_TASK_INIT at the beginning of a fork/clone system call when the
> new task struct first becomes available.
> 
> WATCH_TASK_CLONE just before returning successfully from a fork/clone.
> 
> WATCH_TASK_EXEC just before successfully returning from the exec
> system call.
> 
> WATCH_TASK_UID every time a task's real or effective user id changes.
> 
> WATCH_TASK_GID every time a task's real or effective group id changes.
> 
> WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
> for any reason. 
> 
> WATCH_TASK_FREE is called before critical task structures like
> the mm_struct become inaccessible and the task is subsequently freed.
> 
> The next patch will add a debugfs interface for measuring fork and exit rates
> which can be used to calculate the overhead of the task watcher 
> infrastructure.

What's the point of the ELF hackery? This code would be a lot simpler
and more understandable if you simply had task_watcher_ops and a
register / unregister function for it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-15 Thread Christoph Hellwig
On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
 Associate function calls with significant events in a task's lifetime much 
 like
 we handle kernel and module init/exit functions. This creates a table for each
 of the following events in the task_watchers_table ELF section:
 
 WATCH_TASK_INIT at the beginning of a fork/clone system call when the
 new task struct first becomes available.
 
 WATCH_TASK_CLONE just before returning successfully from a fork/clone.
 
 WATCH_TASK_EXEC just before successfully returning from the exec
 system call.
 
 WATCH_TASK_UID every time a task's real or effective user id changes.
 
 WATCH_TASK_GID every time a task's real or effective group id changes.
 
 WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
 for any reason. 
 
 WATCH_TASK_FREE is called before critical task structures like
 the mm_struct become inaccessible and the task is subsequently freed.
 
 The next patch will add a debugfs interface for measuring fork and exit rates
 which can be used to calculate the overhead of the task watcher 
 infrastructure.

What's the point of the ELF hackery? This code would be a lot simpler
and more understandable if you simply had task_watcher_ops and a
register / unregister function for it.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-15 Thread Matt Helsley
On Fri, 2006-12-15 at 09:34 +0100, Christoph Hellwig wrote:
 On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
  Associate function calls with significant events in a task's lifetime much 
  like
  we handle kernel and module init/exit functions. This creates a table for 
  each
  of the following events in the task_watchers_table ELF section:
 
  WATCH_TASK_INIT at the beginning of a fork/clone system call when the
  new task struct first becomes available.
 
  WATCH_TASK_CLONE just before returning successfully from a fork/clone.
 
  WATCH_TASK_EXEC just before successfully returning from the exec
  system call.
 
  WATCH_TASK_UID every time a task's real or effective user id changes.
 
  WATCH_TASK_GID every time a task's real or effective group id changes.
 
  WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
  for any reason.
 
  WATCH_TASK_FREE is called before critical task structures like
  the mm_struct become inaccessible and the task is subsequently freed.
 
  The next patch will add a debugfs interface for measuring fork and exit 
  rates
  which can be used to calculate the overhead of the task watcher 
  infrastructure.
 
 What's the point of the ELF hackery? This code would be a lot simpler
 and more understandable if you simply had task_watcher_ops and a
 register / unregister function for it.

Andrew asked me to avoid locking and added complexity in the code that
uses one or more task watchers. The ELF hackery helps me avoid locking
in the fork/exit/etc paths that call the registered function.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Task watchers v2

2006-12-15 Thread Matt Helsley
On Fri, 2006-12-15 at 09:34 +0100, Christoph Hellwig wrote:
 On Thu, Dec 14, 2006 at 04:07:55PM -0800, Matt Helsley wrote:
  Associate function calls with significant events in a task's lifetime much 
  like
  we handle kernel and module init/exit functions. This creates a table for 
  each
  of the following events in the task_watchers_table ELF section:
 
  WATCH_TASK_INIT at the beginning of a fork/clone system call when the
  new task struct first becomes available.
 
  WATCH_TASK_CLONE just before returning successfully from a fork/clone.
 
  WATCH_TASK_EXEC just before successfully returning from the exec
  system call.
 
  WATCH_TASK_UID every time a task's real or effective user id changes.
 
  WATCH_TASK_GID every time a task's real or effective group id changes.
 
  WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
  for any reason.
 
  WATCH_TASK_FREE is called before critical task structures like
  the mm_struct become inaccessible and the task is subsequently freed.
 
  The next patch will add a debugfs interface for measuring fork and exit 
  rates
  which can be used to calculate the overhead of the task watcher 
  infrastructure.
 
 What's the point of the ELF hackery? This code would be a lot simpler
 and more understandable if you simply had task_watcher_ops and a
 register / unregister function for it.

A bit more verbose response:

I posted a notifier chain implementation back in June that bears some
resemblance to your suggestion -- a structure needed to be registered at
runtime. There was a single global list of them to iterate over for each
event.

This patch and the following patches are significantly shorter than
their counterparts in that series. They avoid iterating over elements
with empty ops. The way function pointers and function bodies are
grouped together by this series should improve locality. The fact that
there's no locking required also makes it simpler to analyze and use.

The patches to allow modules to register task watchers does make things
more complex though -- that does require a list and a lock. However, the
lock does not need to be taken in the fork/exec/etc paths if we pin the
module. In contrast your suggested approach is simpler because it
doesn't treat modules any differently. However overall I think the
balance still favors these patches.

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Task watchers v2

2006-12-14 Thread Matt Helsley
Associate function calls with significant events in a task's lifetime much like
we handle kernel and module init/exit functions. This creates a table for each
of the following events in the task_watchers_table ELF section:

WATCH_TASK_INIT at the beginning of a fork/clone system call when the
new task struct first becomes available.

WATCH_TASK_CLONE just before returning successfully from a fork/clone.

WATCH_TASK_EXEC just before successfully returning from the exec
system call.

WATCH_TASK_UID every time a task's real or effective user id changes.

WATCH_TASK_GID every time a task's real or effective group id changes.

WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
for any reason. 

WATCH_TASK_FREE is called before critical task structures like
the mm_struct become inaccessible and the task is subsequently freed.

The next patch will add a debugfs interface for measuring fork and exit rates
which can be used to calculate the overhead of the task watcher infrastructure.

Subsequent patches will make use of task watchers to simplify fork, exit,
and many of the system calls that set [er][ug]ids.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Cc: Jes Sorensen <[EMAIL PROTECTED]>
Cc: Christoph Hellwig <[EMAIL PROTECTED]>
Cc: Al Viro <[EMAIL PROTECTED]>
Cc: Steve Grubb <[EMAIL PROTECTED]>
Cc: linux-audit@redhat.com
Cc: Paul Jackson <[EMAIL PROTECTED]>
---
 fs/exec.c |3 ++
 include/asm-generic/vmlinux.lds.h |   26 +
 include/linux/init.h  |   47 ++
 kernel/Makefile   |2 -
 kernel/exit.c |3 ++
 kernel/fork.c |   15 
 kernel/sys.c  |9 +++
 kernel/task_watchers.c|   41 +
 8 files changed, 141 insertions(+), 5 deletions(-)

Index: linux-2.6.19/kernel/sys.c
===
--- linux-2.6.19.orig/kernel/sys.c
+++ linux-2.6.19/kernel/sys.c
@@ -27,10 +27,11 @@
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 
@@ -957,10 +958,11 @@ asmlinkage long sys_setregid(gid_t rgid,
current->fsgid = new_egid;
current->egid = new_egid;
current->gid = new_rgid;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+   notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
 }
 
 /*
  * setgid() is implemented like SysV w/ SAVED_IDS 
@@ -992,10 +994,11 @@ asmlinkage long sys_setgid(gid_t gid)
else
return -EPERM;
 
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+   notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
 }
   
 static int set_user(uid_t new_ruid, int dumpclear)
 {
@@ -1080,10 +1083,11 @@ asmlinkage long sys_setreuid(uid_t ruid,
current->suid = current->euid;
current->fsuid = current->euid;
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);
 
return security_task_post_setuid(old_ruid, old_euid, old_suid, 
LSM_SETID_RE);
 }
 
 
@@ -1127,10 +1131,11 @@ asmlinkage long sys_setuid(uid_t uid)
current->fsuid = current->euid = uid;
current->suid = new_suid;
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);
 
return security_task_post_setuid(old_ruid, old_euid, old_suid, 
LSM_SETID_ID);
 }
 
 
@@ -1175,10 +1180,11 @@ asmlinkage long sys_setresuid(uid_t ruid
if (suid != (uid_t) -1)
current->suid = suid;
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);
 
return security_task_post_setuid(old_ruid, old_euid, old_suid, 
LSM_SETID_RES);
 }
 
 asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t 
__user *suid)
@@ -1227,10 +1233,11 @@ asmlinkage long sys_setresgid(gid_t rgid
if (sgid != (gid_t) -1)
current->sgid = sgid;
 
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+   notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
 }
 
 asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t 
__user *sgid)
 {
@@ -1268,10 +1275,11 @@ asmlinkage long sys_setfsuid(uid_t uid)
current->fsuid = uid;
}
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);
 
security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, 
LSM_SETID_FS);
 
return old_fsuid;
 }
@@ 

Task watchers v2

2006-12-14 Thread Matt Helsley
Associate function calls with significant events in a task's lifetime much like
we handle kernel and module init/exit functions. This creates a table for each
of the following events in the task_watchers_table ELF section:

WATCH_TASK_INIT at the beginning of a fork/clone system call when the
new task struct first becomes available.

WATCH_TASK_CLONE just before returning successfully from a fork/clone.

WATCH_TASK_EXEC just before successfully returning from the exec
system call.

WATCH_TASK_UID every time a task's real or effective user id changes.

WATCH_TASK_GID every time a task's real or effective group id changes.

WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting
for any reason. 

WATCH_TASK_FREE is called before critical task structures like
the mm_struct become inaccessible and the task is subsequently freed.

The next patch will add a debugfs interface for measuring fork and exit rates
which can be used to calculate the overhead of the task watcher infrastructure.

Subsequent patches will make use of task watchers to simplify fork, exit,
and many of the system calls that set [er][ug]ids.

Signed-off-by: Matt Helsley [EMAIL PROTECTED]
Cc: Andrew Morton [EMAIL PROTECTED]
Cc: Jes Sorensen [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Steve Grubb [EMAIL PROTECTED]
Cc: linux-audit@redhat.com
Cc: Paul Jackson [EMAIL PROTECTED]
---
 fs/exec.c |3 ++
 include/asm-generic/vmlinux.lds.h |   26 +
 include/linux/init.h  |   47 ++
 kernel/Makefile   |2 -
 kernel/exit.c |3 ++
 kernel/fork.c |   15 
 kernel/sys.c  |9 +++
 kernel/task_watchers.c|   41 +
 8 files changed, 141 insertions(+), 5 deletions(-)

Index: linux-2.6.19/kernel/sys.c
===
--- linux-2.6.19.orig/kernel/sys.c
+++ linux-2.6.19/kernel/sys.c
@@ -27,10 +27,11 @@
 #include linux/suspend.h
 #include linux/tty.h
 #include linux/signal.h
 #include linux/cn_proc.h
 #include linux/getcpu.h
+#include linux/init.h
 
 #include linux/compat.h
 #include linux/syscalls.h
 #include linux/kprobes.h
 
@@ -957,10 +958,11 @@ asmlinkage long sys_setregid(gid_t rgid,
current-fsgid = new_egid;
current-egid = new_egid;
current-gid = new_rgid;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+   notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
 }
 
 /*
  * setgid() is implemented like SysV w/ SAVED_IDS 
@@ -992,10 +994,11 @@ asmlinkage long sys_setgid(gid_t gid)
else
return -EPERM;
 
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+   notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
 }
   
 static int set_user(uid_t new_ruid, int dumpclear)
 {
@@ -1080,10 +1083,11 @@ asmlinkage long sys_setreuid(uid_t ruid,
current-suid = current-euid;
current-fsuid = current-euid;
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);
 
return security_task_post_setuid(old_ruid, old_euid, old_suid, 
LSM_SETID_RE);
 }
 
 
@@ -1127,10 +1131,11 @@ asmlinkage long sys_setuid(uid_t uid)
current-fsuid = current-euid = uid;
current-suid = new_suid;
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);
 
return security_task_post_setuid(old_ruid, old_euid, old_suid, 
LSM_SETID_ID);
 }
 
 
@@ -1175,10 +1180,11 @@ asmlinkage long sys_setresuid(uid_t ruid
if (suid != (uid_t) -1)
current-suid = suid;
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);
 
return security_task_post_setuid(old_ruid, old_euid, old_suid, 
LSM_SETID_RES);
 }
 
 asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t 
__user *suid)
@@ -1227,10 +1233,11 @@ asmlinkage long sys_setresgid(gid_t rgid
if (sgid != (gid_t) -1)
current-sgid = sgid;
 
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
+   notify_task_watchers(WATCH_TASK_GID, 0, current);
return 0;
 }
 
 asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t 
__user *sgid)
 {
@@ -1268,10 +1275,11 @@ asmlinkage long sys_setfsuid(uid_t uid)
current-fsuid = uid;
}
 
key_fsuid_changed(current);
proc_id_connector(current, PROC_EVENT_UID);
+   notify_task_watchers(WATCH_TASK_UID, 0, current);