Re: [PATCH 20/27] ptrace: arch_has_block_step

2007-11-28 Thread David Wilder

Roland McGrath wrote:



+#ifndef arch_has_block_step
+/**
+ * arch_has_block_step - does this CPU support user-mode block-step?
+ *
+ * If this is defined, then there must be a function declaration or inline
+ * for user_enable_block_step(), and arch_has_single_step() must be defined
+ * too.  arch_has_block_step() should evaluate to nonzero iff the machine
+ * supports step-until-branch for user mode.  It can be a constant or it
+ * can test a CPU feature bit.
+ */
+#define arch_has_single_step() (0)


should this be #define arch_has_block_step()(0)


+
+/**
+ * user_enable_block_step - step until branch in user-mode task
+ * @task: either current or a task stopped in %TASK_TRACED
+ *
+ * This can only be called when arch_has_block_step() has returned nonzero,
+ * and will never be called when single-instruction stepping is being used.
+ * Set @task so that when it returns to user mode, it will trap after the
+ * next branch or trap taken.
+ */
+static inline void user_enable_block_step(struct task_struct *task)
+{
+   BUG();  /* This can never be called.  */
+}
+#endif /* arch_has_block_step */
+


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 20/27] ptrace: arch_has_block_step

2007-11-28 Thread David Wilder

Roland McGrath wrote:
snip


+#ifndef arch_has_block_step
+/**
+ * arch_has_block_step - does this CPU support user-mode block-step?
+ *
+ * If this is defined, then there must be a function declaration or inline
+ * for user_enable_block_step(), and arch_has_single_step() must be defined
+ * too.  arch_has_block_step() should evaluate to nonzero iff the machine
+ * supports step-until-branch for user mode.  It can be a constant or it
+ * can test a CPU feature bit.
+ */
+#define arch_has_single_step() (0)


should this be #define arch_has_block_step()(0)


+
+/**
+ * user_enable_block_step - step until branch in user-mode task
+ * @task: either current or a task stopped in %TASK_TRACED
+ *
+ * This can only be called when arch_has_block_step() has returned nonzero,
+ * and will never be called when single-instruction stepping is being used.
+ * Set @task so that when it returns to user mode, it will trap after the
+ * next branch or trap taken.
+ */
+static inline void user_enable_block_step(struct task_struct *task)
+{
+   BUG();  /* This can never be called.  */
+}
+#endif /* arch_has_block_step */
+

snip
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Submission of Trace patches

2007-11-13 Thread David Wilder

Andrew-

Please see my current submission of the "Trace" patches at:
http://lkml.org/lkml/2007/11/12/281
http://lkml.org/lkml/2007/11/12/282
http://lkml.org/lkml/2007/11/12/283
http://lkml.org/lkml/2007/11/12/284

I believe this code is now ready for inclusion in the mm tree. Please 
consider this request to move trace into the mm tree.


Regards
  David Wilder
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Submission of Trace patches

2007-11-13 Thread David Wilder

Andrew-

Please see my current submission of the Trace patches at:
http://lkml.org/lkml/2007/11/12/281
http://lkml.org/lkml/2007/11/12/282
http://lkml.org/lkml/2007/11/12/283
http://lkml.org/lkml/2007/11/12/284

I believe this code is now ready for inclusion in the mm tree. Please 
consider this request to move trace into the mm tree.


Regards
  David Wilder
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] Trace code and documentation

2007-10-04 Thread David Wilder

Andi Kleen wrote:

On Thu, Oct 04, 2007 at 12:19:35PM -0700, David Wilder wrote:

Andi Kleen wrote:

"David J. Wilder" <[EMAIL PROTECTED]> writes:

@@ -0,0 +1,160 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.

Wasn't relayfs supposed to do that already? Why do you need another
wrapper around it? 
The code in trace is exactly what all the current users of relay do. 
Therefor trace reduces the duplication of code.


If everybody does this then the code should be just put into
relayfs?


I disagree, I keeping the code separate (layering if you will) makes it 
easer to use and maintain.







Is this also really still faster than a printk below log level
(without console driver overhead). If not then why not just
use printk?
Are you arguing against relayfs or trace?  Trace just makes relayfs 
easer to use.  I think relayfs can stand up for it's self.


I'm arguing against complicated trace mechanisms that are not fast.


What makes trace complicated?  It is just, open ,start/stop, close.  I 
can't see how an trace API could be any simpler.




At some point when I looked at relayfs it seemed to be reasonably
fast (per cpu buffers; not much locking,

 over head per call roughtly like putchar()),
but that might have regressed. 


No regression has occurred.  According the relay documentation if you 
use global bufferers you must use locking.  If you don't want to use 
locking use per-cpu bufferers.




Your example module with its lock definitely looks very slow and I don't approve
of it.



If you don't approve of the locking then use per-cpu bufferers.  The 
example will do ether.




The example shows a way to create an ASCII data layer.


ASCII layers don't make much sense imho -- these should just use printk.



So the only way I should pass ASCII to user space is using printk?  I 
don't understand that.  Again nothing in trace limits you to ASCII data.



Fast dedicated binary log channels make sense though; but you don't
seem really to be very concentrated on that.


I impose no restriction on what type of data you can pass over trace's 
fast dedicated channels.




True, to make trace "fast" you need a data layer that can handle the 
requirements of per-cpu buffers.  However there are still advantages of 
trace over printk even when using global bufferers: selectable bufferer 
sizes,


printk has selectable buffer sizes too.


   "Long term we probably want more complex tracing based on lttng,
but I'm a big fan of starting out simple and doing incremental
changes."


It's just that relayfs + another not simple layer are definitely not simple.

For a simple logger I'm thinking more like something like SGI's old
ktrace module (which undoubtedly many other people have recreated many
times for specific debugging scenarios)

But that all only makes sense if the overhead is really kept low
and i don't see that in your approach.


Is your complaint with the overhead of setting up a trace channel or the 
overhead of writing to a trace channel?   For the later, trace adds 
almost no overhead on top of relay.





One advantage of the trace approach is separating control and data 
layers, therefor trace can support multiple data layers to fit multiple 
requirements.


I have my ideas on how to develop data layer, others may have their own 
ideas and I welcome the input.


relayfs was supposed to be that data layer.


I am using the layer definitions described in trace.txt.  In this 
definition relay is a buffering layer.




PS: Systemtap has been criticized for introducing out-of-tree kernel 
code.  A clear direction from the community is to move re-usable code 
in-tree where it can be maintained.  Trace is a move in that direction.


I'm all for that. I believe a simple fast efficient no frills logger
would serve systemtap just fine too. But the approach here seems
to be more to add all kinds of knobs and whizzles until you end
up with something as slow with printk. And since we already have
printk another one just doesn't seem to make much sense.


If by knobs you mean the trace controls.  The only one that has any 
effect on the "speed" of tracing is the control to start and stop 
tracing.  And that had been designed to impose the minimal impact 
possible (one "if" in the tracing path).




-Andi



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] Trace code and documentation

2007-10-04 Thread David Wilder

Andi Kleen wrote:

"David J. Wilder" <[EMAIL PROTECTED]> writes:

@@ -0,0 +1,160 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.


Wasn't relayfs supposed to do that already? Why do you need another
wrapper around it? 


The code in trace is exactly what all the current users of relay do. 
Therefor trace reduces the duplication of code.





Is this also really still faster than a printk below log level
(without console driver overhead). If not then why not just
use printk?


Are you arguing against relayfs or trace?  Trace just makes relayfs 
easer to use.  I think relayfs can stand up for it's self.





Especially your example is worrying. It essentially defines a new
printk. I think there is a case for a fast logging subsystem because
printk() is admittedly a little slow [somewhat slow below log level
and incredible slow above it]

But fast means binary items (not sprintf), no global locks, not
multiple layers, per CPU etc.. But your example and this patch has all
this and I bet it is not very fast.



Each user of trace has its own requirements for passing data over 
relayfs channels. This is why the documentation describes separate 
control and data layers.  The trace API provides a control layer with 
this flexibility.


The example shows a way to create an ASCII data layer.  The format of 
the data (binary or ascii) is just a function of how the data layer 
formats it.


Locking is only required when using global bufferers. The option of 
selecting per-cpu vs global bufferers is available to the trace user. 
The example (and the documentation) shows how to use both methods (See: 
#define USE_GLOBAL_BUFFER in the example).


There is no impact of adding an extra layer. The primitives for trace 
adds code for trace setup and control, but trace is not doing anything 
that a relayfs user would not have to do anyway.  We mostly care about 
the impact of writing data to the trace channels and trace has no impact 
there.



Is the result (e.g. the trace example module) still any faster
than printk below log level? If not then why bother.

Adding another slow logger would be just a waste of time imho.
It just means that everybody who needs a fast logger just need
to reimplement their own anyways. And the people who can tolerate
slow loggers are probably already adequately served by 
printk. Also there is already direct relayfs.


True, to make trace "fast" you need a data layer that can handle the 
requirements of per-cpu buffers.  However there are still advantages of 
trace over printk even when using global bufferers: selectable bufferer 
sizes, separate data channels (not have to share data channels with 
every other subsystem in the kernel), trace control, non-overwrite mode 
and buffer management.


The next step is to provide data layer that can fully take advantage of 
per-cpu bufferers (systemtap shows us one example). Trace give us a 
place to build it.  As Christoph's said about trace:


   "Long term we probably want more complex tracing based on lttng,
but I'm a big fan of starting out simple and doing incremental
changes."

One advantage of the trace approach is separating control and data 
layers, therefor trace can support multiple data layers to fit multiple 
requirements.


I have my ideas on how to develop data layer, others may have their own 
ideas and I welcome the input.


-Dave

PS: Systemtap has been criticized for introducing out-of-tree kernel 
code.  A clear direction from the community is to move re-usable code 
in-tree where it can be maintained.  Trace is a move in that direction.


Dave


-Andi



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] Trace code and documentation

2007-10-04 Thread David Wilder

Andi Kleen wrote:

David J. Wilder [EMAIL PROTECTED] writes:

@@ -0,0 +1,160 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.


Wasn't relayfs supposed to do that already? Why do you need another
wrapper around it? 


The code in trace is exactly what all the current users of relay do. 
Therefor trace reduces the duplication of code.





Is this also really still faster than a printk below log level
(without console driver overhead). If not then why not just
use printk?


Are you arguing against relayfs or trace?  Trace just makes relayfs 
easer to use.  I think relayfs can stand up for it's self.





Especially your example is worrying. It essentially defines a new
printk. I think there is a case for a fast logging subsystem because
printk() is admittedly a little slow [somewhat slow below log level
and incredible slow above it]

But fast means binary items (not sprintf), no global locks, not
multiple layers, per CPU etc.. But your example and this patch has all
this and I bet it is not very fast.



Each user of trace has its own requirements for passing data over 
relayfs channels. This is why the documentation describes separate 
control and data layers.  The trace API provides a control layer with 
this flexibility.


The example shows a way to create an ASCII data layer.  The format of 
the data (binary or ascii) is just a function of how the data layer 
formats it.


Locking is only required when using global bufferers. The option of 
selecting per-cpu vs global bufferers is available to the trace user. 
The example (and the documentation) shows how to use both methods (See: 
#define USE_GLOBAL_BUFFER in the example).


There is no impact of adding an extra layer. The primitives for trace 
adds code for trace setup and control, but trace is not doing anything 
that a relayfs user would not have to do anyway.  We mostly care about 
the impact of writing data to the trace channels and trace has no impact 
there.



Is the result (e.g. the trace example module) still any faster
than printk below log level? If not then why bother.

Adding another slow logger would be just a waste of time imho.
It just means that everybody who needs a fast logger just need
to reimplement their own anyways. And the people who can tolerate
slow loggers are probably already adequately served by 
printk. Also there is already direct relayfs.


True, to make trace fast you need a data layer that can handle the 
requirements of per-cpu buffers.  However there are still advantages of 
trace over printk even when using global bufferers: selectable bufferer 
sizes, separate data channels (not have to share data channels with 
every other subsystem in the kernel), trace control, non-overwrite mode 
and buffer management.


The next step is to provide data layer that can fully take advantage of 
per-cpu bufferers (systemtap shows us one example). Trace give us a 
place to build it.  As Christoph's said about trace:


   Long term we probably want more complex tracing based on lttng,
but I'm a big fan of starting out simple and doing incremental
changes.

One advantage of the trace approach is separating control and data 
layers, therefor trace can support multiple data layers to fit multiple 
requirements.


I have my ideas on how to develop data layer, others may have their own 
ideas and I welcome the input.


-Dave

PS: Systemtap has been criticized for introducing out-of-tree kernel 
code.  A clear direction from the community is to move re-usable code 
in-tree where it can be maintained.  Trace is a move in that direction.


Dave


-Andi



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] Trace code and documentation

2007-10-04 Thread David Wilder

Andi Kleen wrote:

On Thu, Oct 04, 2007 at 12:19:35PM -0700, David Wilder wrote:

Andi Kleen wrote:

David J. Wilder [EMAIL PROTECTED] writes:

@@ -0,0 +1,160 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.

Wasn't relayfs supposed to do that already? Why do you need another
wrapper around it? 
The code in trace is exactly what all the current users of relay do. 
Therefor trace reduces the duplication of code.


If everybody does this then the code should be just put into
relayfs?


I disagree, I keeping the code separate (layering if you will) makes it 
easer to use and maintain.







Is this also really still faster than a printk below log level
(without console driver overhead). If not then why not just
use printk?
Are you arguing against relayfs or trace?  Trace just makes relayfs 
easer to use.  I think relayfs can stand up for it's self.


I'm arguing against complicated trace mechanisms that are not fast.


What makes trace complicated?  It is just, open ,start/stop, close.  I 
can't see how an trace API could be any simpler.




At some point when I looked at relayfs it seemed to be reasonably
fast (per cpu buffers; not much locking,

 over head per call roughtly like putchar()),
but that might have regressed. 


No regression has occurred.  According the relay documentation if you 
use global bufferers you must use locking.  If you don't want to use 
locking use per-cpu bufferers.




Your example module with its lock definitely looks very slow and I don't approve
of it.



If you don't approve of the locking then use per-cpu bufferers.  The 
example will do ether.




The example shows a way to create an ASCII data layer.


ASCII layers don't make much sense imho -- these should just use printk.



So the only way I should pass ASCII to user space is using printk?  I 
don't understand that.  Again nothing in trace limits you to ASCII data.



Fast dedicated binary log channels make sense though; but you don't
seem really to be very concentrated on that.


I impose no restriction on what type of data you can pass over trace's 
fast dedicated channels.




True, to make trace fast you need a data layer that can handle the 
requirements of per-cpu buffers.  However there are still advantages of 
trace over printk even when using global bufferers: selectable bufferer 
sizes,


printk has selectable buffer sizes too.


   Long term we probably want more complex tracing based on lttng,
but I'm a big fan of starting out simple and doing incremental
changes.


It's just that relayfs + another not simple layer are definitely not simple.

For a simple logger I'm thinking more like something like SGI's old
ktrace module (which undoubtedly many other people have recreated many
times for specific debugging scenarios)

But that all only makes sense if the overhead is really kept low
and i don't see that in your approach.


Is your complaint with the overhead of setting up a trace channel or the 
overhead of writing to a trace channel?   For the later, trace adds 
almost no overhead on top of relay.





One advantage of the trace approach is separating control and data 
layers, therefor trace can support multiple data layers to fit multiple 
requirements.


I have my ideas on how to develop data layer, others may have their own 
ideas and I welcome the input.


relayfs was supposed to be that data layer.


I am using the layer definitions described in trace.txt.  In this 
definition relay is a buffering layer.




PS: Systemtap has been criticized for introducing out-of-tree kernel 
code.  A clear direction from the community is to move re-usable code 
in-tree where it can be maintained.  Trace is a move in that direction.


I'm all for that. I believe a simple fast efficient no frills logger
would serve systemtap just fine too. But the approach here seems
to be more to add all kinds of knobs and whizzles until you end
up with something as slow with printk. And since we already have
printk another one just doesn't seem to make much sense.


If by knobs you mean the trace controls.  The only one that has any 
effect on the speed of tracing is the control to start and stop 
tracing.  And that had been designed to impose the minimal impact 
possible (one if in the tracing path).




-Andi



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


A kernel Tracing interface (was Re: -mm merge plans for 2.6.24)

2007-10-03 Thread David Wilder


Andrew-

Could you please add the trace patches to the merge list?
These patches have been very well reviewed on lkml. I believe they are 
ready to be merged.  The patches can be found here:

http://lkml.org/lkml/2007/10/2/236
http://lkml.org/lkml/2007/10/2/237
http://lkml.org/lkml/2007/10/2/238
http://lkml.org/lkml/2007/10/2/239

Dave Wilder
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


A kernel Tracing interface (was Re: -mm merge plans for 2.6.24)

2007-10-03 Thread David Wilder


Andrew-

Could you please add the trace patches to the merge list?
These patches have been very well reviewed on lkml. I believe they are 
ready to be merged.  The patches can be found here:

http://lkml.org/lkml/2007/10/2/236
http://lkml.org/lkml/2007/10/2/237
http://lkml.org/lkml/2007/10/2/238
http://lkml.org/lkml/2007/10/2/239

Dave Wilder
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Trace sample

2007-10-02 Thread David Wilder

Randy Dunlap wrote:

On Tue, 02 Oct 2007 09:33:25 -0700 David J. Wilder wrote:


Trace example - Adds the trace example to samples/

Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 samples/Kconfig|6 ++
 samples/Makefile   |1 +
 samples/trace/Makefile |4 +
 samples/trace/fork_trace.c |  132 
 4 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/samples/Kconfig b/samples/Kconfig
index 57bb223..e11c806 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -13,4 +13,10 @@ config SAMPLE_MARKERS
help
  This build markers example modules.
 
+config SAMPLE_TRACE

+   tristate "Build trace example -- loadable modules only"
+   depends on TRACE && m


The sample code uses kprobes, so this should also be:

depends on KPROBES

Is (are?) kprobes always needed for trace?



No, just the sample.

 If so, the documentation

probably should mention that also (along with relay & debugfs).


...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] Trace sample

2007-10-02 Thread David Wilder

Randy Dunlap wrote:

On Tue, 02 Oct 2007 09:33:25 -0700 David J. Wilder wrote:


Trace example - Adds the trace example to samples/

Signed-off-by: David Wilder [EMAIL PROTECTED]
---
 samples/Kconfig|6 ++
 samples/Makefile   |1 +
 samples/trace/Makefile |4 +
 samples/trace/fork_trace.c |  132 
 4 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/samples/Kconfig b/samples/Kconfig
index 57bb223..e11c806 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -13,4 +13,10 @@ config SAMPLE_MARKERS
help
  This build markers example modules.
 
+config SAMPLE_TRACE

+   tristate Build trace example -- loadable modules only
+   depends on TRACE  m


The sample code uses kprobes, so this should also be:

depends on KPROBES

Is (are?) kprobes always needed for trace?



No, just the sample.

 If so, the documentation

probably should mention that also (along with relay  debugfs).


...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread David Wilder

Mathieu Desnoyers wrote:

* David J. Wilder ([EMAIL PROTECTED]) wrote:

These patches provide a kernel tracing interface called "trace".

(update) Moved the sample code to the new samples\ subdir

The motivation for "trace" is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

Patches are against 2.6.23-rc6-mm1

Summary of patches:
[patch 1/3]  Trace code and documentation
[patch 2/3]  Relay Reset Consumed
[patch 3/3]  Trace sample

Note: Patches 1/3 and 2/3 must be applied together.

Note: The following patches must be applied with 3/3.
[patch 3/5] Add samples subdir
http://lkml.org/lkml/2007/9/25/157


I guess you mean:
[patch 3/5] Add samples subdir (updated)
http://lkml.org/lkml/2007/9/25/366

(please try it with this new version, it should work as is..)


yes it works fine with the new version of your patch,  I will update my 
note for the next round of submissions.


Mathieu


[patch 4/5] Linux Kernel Markers - Samples
http://lkml.org/lkml/2007/9/25/166

Signed-off-by: David Wilder <[EMAIL PROTECTED]>






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread David Wilder

Randy Dunlap wrote:

On Wed, 26 Sep 2007 11:22:29 -0700 David J. Wilder wrote:


These patches provide a kernel tracing interface called "trace".

(update) Moved the sample code to the new samples\ subdir

The motivation for "trace" is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

Patches are against 2.6.23-rc6-mm1

Summary of patches:
[patch 1/3]  Trace code and documentation
[patch 2/3]  Relay Reset Consumed
[patch 3/3]  Trace sample

Note: Patches 1/3 and 2/3 must be applied together.


Patch 2 provides an interface that patch 1 needs, correct?

Yes.

So yes, patches 1 & 2 need to be applied together (merged),
or their order could be reversed, yes?  
2/3 should be applied at the same time as 1/3,  or 2/3 can be applied 
standalone.  The order they are applied makes no difference.  But trace 
will not build if the relay patch is not applied.


Can't the Relay patch

be merged standalone without breaking anything?


Yes the relay patch can be applied standalone.







Note: The following patches must be applied with 3/3.
[patch 3/5] Add samples subdir
http://lkml.org/lkml/2007/9/25/157
[patch 4/5] Linux Kernel Markers - Samples
http://lkml.org/lkml/2007/9/25/166



---
~Randy
Phaedrus says that Quality is about caring.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Patch] Trace - Samples

2007-09-26 Thread David Wilder

Randy Dunlap wrote:

Mathieu Desnoyers wrote:

* Randy Dunlap ([EMAIL PROTECTED]) wrote:

On Tue, 25 Sep 2007 13:35:47 -0700 David J. Wilder wrote:


This patch Moves the trace example into the new samples/ infrastructure

Requires: [patch 3/5] Add samples subdir  
(http://lkml.org/lkml/2007/9/25/157)
and create samples/Makefile or apply: [patch 4/5] Linux Kernel 
Markers - Samples (http://lkml.org/lkml/2007/9/25/166)

Signed-off-by: David Wilder <[EMAIL PROTECTED]>

Looks good.  Thanks.

Acked-by: Randy Dunlap <[EMAIL PROTECTED]>

Andrew, Mathieu is making changes to patch 3/5 referenced above.
When that is done, do you need a resend of all of these samples/
patches or do you have them straight?

Basically we have:
a.  add samples/ infrastructure  (new coming from Mathieu)
b.  add markers to samples/  (Mathieu)

Mine is correct as is. (with the new samples infrastructure)



Right, your 2 can be the base (basis) for samples/, then the other
follow-ups can be applied.




c.  add trace/ to samples/  (this patch from David)


I need to re-test the trace-sample patch against Mathieu's new 
infrastructure patch.  I also need to re-post the trace patch with the 
sample removed.  I will re-post both later today.



d.  add kprobes/ to samples  (move some from Documentation)





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC Patch] Trace - Samples

2007-09-26 Thread David Wilder

Randy Dunlap wrote:

Mathieu Desnoyers wrote:

* Randy Dunlap ([EMAIL PROTECTED]) wrote:

On Tue, 25 Sep 2007 13:35:47 -0700 David J. Wilder wrote:


This patch Moves the trace example into the new samples/ infrastructure

Requires: [patch 3/5] Add samples subdir  
(http://lkml.org/lkml/2007/9/25/157)
and create samples/Makefile or apply: [patch 4/5] Linux Kernel 
Markers - Samples (http://lkml.org/lkml/2007/9/25/166)

Signed-off-by: David Wilder [EMAIL PROTECTED]

Looks good.  Thanks.

Acked-by: Randy Dunlap [EMAIL PROTECTED]

Andrew, Mathieu is making changes to patch 3/5 referenced above.
When that is done, do you need a resend of all of these samples/
patches or do you have them straight?

Basically we have:
a.  add samples/ infrastructure  (new coming from Mathieu)
b.  add markers to samples/  (Mathieu)

Mine is correct as is. (with the new samples infrastructure)



Right, your 2 can be the base (basis) for samples/, then the other
follow-ups can be applied.




c.  add trace/ to samples/  (this patch from David)


I need to re-test the trace-sample patch against Mathieu's new 
infrastructure patch.  I also need to re-post the trace patch with the 
sample removed.  I will re-post both later today.



d.  add kprobes/ to samples  (move some from Documentation)





-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread David Wilder

Randy Dunlap wrote:

On Wed, 26 Sep 2007 11:22:29 -0700 David J. Wilder wrote:


These patches provide a kernel tracing interface called trace.

(update) Moved the sample code to the new samples\ subdir

The motivation for trace is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

Patches are against 2.6.23-rc6-mm1

Summary of patches:
[patch 1/3]  Trace code and documentation
[patch 2/3]  Relay Reset Consumed
[patch 3/3]  Trace sample

Note: Patches 1/3 and 2/3 must be applied together.


Patch 2 provides an interface that patch 1 needs, correct?

Yes.

So yes, patches 1  2 need to be applied together (merged),
or their order could be reversed, yes?  
2/3 should be applied at the same time as 1/3,  or 2/3 can be applied 
standalone.  The order they are applied makes no difference.  But trace 
will not build if the relay patch is not applied.


Can't the Relay patch

be merged standalone without breaking anything?


Yes the relay patch can be applied standalone.







Note: The following patches must be applied with 3/3.
[patch 3/5] Add samples subdir
http://lkml.org/lkml/2007/9/25/157
[patch 4/5] Linux Kernel Markers - Samples
http://lkml.org/lkml/2007/9/25/166



---
~Randy
Phaedrus says that Quality is about caring.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread David Wilder

Mathieu Desnoyers wrote:

* David J. Wilder ([EMAIL PROTECTED]) wrote:

These patches provide a kernel tracing interface called trace.

(update) Moved the sample code to the new samples\ subdir

The motivation for trace is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

Patches are against 2.6.23-rc6-mm1

Summary of patches:
[patch 1/3]  Trace code and documentation
[patch 2/3]  Relay Reset Consumed
[patch 3/3]  Trace sample

Note: Patches 1/3 and 2/3 must be applied together.

Note: The following patches must be applied with 3/3.
[patch 3/5] Add samples subdir
http://lkml.org/lkml/2007/9/25/157


I guess you mean:
[patch 3/5] Add samples subdir (updated)
http://lkml.org/lkml/2007/9/25/366

(please try it with this new version, it should work as is..)


yes it works fine with the new version of your patch,  I will update my 
note for the next round of submissions.


Mathieu


[patch 4/5] Linux Kernel Markers - Samples
http://lkml.org/lkml/2007/9/25/166

Signed-off-by: David Wilder [EMAIL PROTECTED]






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (resend)

2007-09-24 Thread David Wilder

Christoph Hellwig wrote:

On Mon, Sep 24, 2007 at 08:38:34AM -0700, David Wilder wrote:

NACK, don't put code into Documentation/.  Put it into kernel as it's
actually useful kernel code.  
Are you suggesting moving the example code into kernel? Or complaining 
about example code in /Documentation?


Both.  example code should be integrated with the build system so it
gets built.


I agree, but I have not seen this done before, can you point me at an 
example of how to structure this?





And add clone,exec and exit while you're at it.

Hu?  A syscall tracer sounds like a nice idea but that is not what I am 
trying to accomplish.  I will let Systemtap handle that.


Systemtap doesn't help anyone as it's not in the tree.  I haven't even
asked you to provide a full system call tracing modulem but provide at
least one that's useful for a certain use-case (looking at processes)
instead of almost useless code.



I don't have a problem adding additional trace points to the example 
code.  However, anyone trying to use the code for some real purpose will 
want to tweak the code based on their needs, at a minimum to select what 
data to trace.  I don't think we gain much by adding more to the example 
other than to make it more complicated.  I am strong believer in keeping 
example code as simple as possible.


If you are suggesting adding a separate feature for process tracing (not 
just an example) that is a good idea also.  But is should be a separate 
patch, not part of the trace patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (resend)

2007-09-24 Thread David Wilder

Christoph Hellwig wrote:

On Fri, Sep 21, 2007 at 09:23:28PM -0700, David J. Wilder wrote:

My last posting was mangled by my mailer.  I hope this one is better.
Also corrected Randy's concerns.

Please see previous posting for more information:
http://lkml.org/lkml/2007/9/19/4 (PATCH 0/2)

Note: this patch requires "[Patch 2/2] Relay reset consumed" is applied.

-
Trace - Provides tracing primitives

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: Martin Hunt <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/trace/src/Makefile |7 +
 Documentation/trace/src/README   |   18 +
 Documentation/trace/src/fork_trace.c |  119 +++
 Documentation/trace/trace.txt|  164 ++


NACK, don't put code into Documentation/.  Put it into kernel as it's
actually useful kernel code.  


Are you suggesting moving the example code into kernel? Or complaining 
about example code in /Documentation?


And add clone,exec and exit while you're at it.

Hu?  A syscall tracer sounds like a nice idea but that is not what I am 
trying to accomplish.  I will let Systemtap handle that.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (updated)

2007-09-24 Thread David Wilder

Christoph Hellwig wrote:

Your mailer wrapper the patch so I can't actually apply it to start
playing with the patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


This one should be better: http://lkml.org/lkml/2007/9/22/4
You already responded, so you must have found it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (updated)

2007-09-24 Thread David Wilder

Christoph Hellwig wrote:

Your mailer wrapper the patch so I can't actually apply it to start
playing with the patch.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


This one should be better: http://lkml.org/lkml/2007/9/22/4
You already responded, so you must have found it.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (resend)

2007-09-24 Thread David Wilder

Christoph Hellwig wrote:

On Fri, Sep 21, 2007 at 09:23:28PM -0700, David J. Wilder wrote:

My last posting was mangled by my mailer.  I hope this one is better.
Also corrected Randy's concerns.

Please see previous posting for more information:
http://lkml.org/lkml/2007/9/19/4 (PATCH 0/2)

Note: this patch requires [Patch 2/2] Relay reset consumed is applied.

-start of patch
Trace - Provides tracing primitives

Signed-off-by: Tom Zanussi [EMAIL PROTECTED]
Signed-off-by: Martin Hunt [EMAIL PROTECTED]
Signed-off-by: David Wilder [EMAIL PROTECTED]
---
 Documentation/trace/src/Makefile |7 +
 Documentation/trace/src/README   |   18 +
 Documentation/trace/src/fork_trace.c |  119 +++
 Documentation/trace/trace.txt|  164 ++


NACK, don't put code into Documentation/.  Put it into kernel as it's
actually useful kernel code.  


Are you suggesting moving the example code into kernel? Or complaining 
about example code in /Documentation?


And add clone,exec and exit while you're at it.

Hu?  A syscall tracer sounds like a nice idea but that is not what I am 
trying to accomplish.  I will let Systemtap handle that.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (resend)

2007-09-24 Thread David Wilder

Christoph Hellwig wrote:

On Mon, Sep 24, 2007 at 08:38:34AM -0700, David Wilder wrote:

NACK, don't put code into Documentation/.  Put it into kernel as it's
actually useful kernel code.  
Are you suggesting moving the example code into kernel? Or complaining 
about example code in /Documentation?


Both.  example code should be integrated with the build system so it
gets built.


I agree, but I have not seen this done before, can you point me at an 
example of how to structure this?





And add clone,exec and exit while you're at it.

Hu?  A syscall tracer sounds like a nice idea but that is not what I am 
trying to accomplish.  I will let Systemtap handle that.


Systemtap doesn't help anyone as it's not in the tree.  I haven't even
asked you to provide a full system call tracing modulem but provide at
least one that's useful for a certain use-case (looking at processes)
instead of almost useless code.



I don't have a problem adding additional trace points to the example 
code.  However, anyone trying to use the code for some real purpose will 
want to tweak the code based on their needs, at a minimum to select what 
data to trace.  I don't think we gain much by adding more to the example 
other than to make it more complicated.  I am strong believer in keeping 
example code as simple as possible.


If you are suggesting adding a separate feature for process tracing (not 
just an example) that is a good idea also.  But is should be a separate 
patch, not part of the trace patch.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (updated)

2007-09-19 Thread David Wilder

Randy Dunlap wrote:

On Wed, 19 Sep 2007 17:20:18 +0100 Christoph Hellwig wrote:


On Wed, Sep 19, 2007 at 07:14:47AM -0700, David Wilder wrote:
I agree with you; however, this is in the example code in the 
Documentation directory,  It is not part of the trace code.  The example 
was just meant to be a demonstration of how the interface works.

So we tell people to write bad code?  Wonderful..

And while we're at it can we please stop the dumb idea to put example
code into Documentation?  If example code doesn't get build during a
make oldconfig it will bitrot real fast and not be useful at all.



That's why they exmaples should not be hidden/embedded in .txt files;
they should be standalone .c files with makefiles etc.

I've built and corrected several of them, but they would be more
likely to be kept up-to-date if they are more available in standalone
files.

and they can be taken out of Documentation/ whenever they go into
util-linux-ng or elsewhere.  Let's get the order correct.

---
~Randy

IMHO keeping example code as standalone files under Documentation/* make 
it easy to build an play with.  I like it better than keeping it on some 
project website where it is even less likely to maintained.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (updated)

2007-09-19 Thread David Wilder

Andi Kleen wrote:

"David J. Wilder" <[EMAIL PROTECTED]> writes:

Not having read the whole thing; just something I noticed.

Gut feeling is that you have too many knobs and options and 
some overengineering though -- simplifying it would be a good thing.



+
+#define TRACE_PRINTF_TMPBUF_SIZE (1024)
+static char trace_tmpbuf[NR_CPUS][TRACE_PRINTF_TMPBUF_SIZE];


That definitely needs to be a per CPU variable. Imagine
what happens on a NR_CPUS==4096 kernel.  In general when 
you have a NR_CPUS indexed array you're likely doing something

wrong. Yes there are still places in the main tree who do that,
but most of them need to be fixed.


I agree with you; however, this is in the example code in the 
Documentation directory,  It is not part of the trace code.  The example 
was just meant to be a demonstration of how the interface works.


-Andi



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (updated)

2007-09-19 Thread David Wilder

Andi Kleen wrote:

David J. Wilder [EMAIL PROTECTED] writes:

Not having read the whole thing; just something I noticed.

Gut feeling is that you have too many knobs and options and 
some overengineering though -- simplifying it would be a good thing.



+
+#define TRACE_PRINTF_TMPBUF_SIZE (1024)
+static char trace_tmpbuf[NR_CPUS][TRACE_PRINTF_TMPBUF_SIZE];


That definitely needs to be a per CPU variable. Imagine
what happens on a NR_CPUS==4096 kernel.  In general when 
you have a NR_CPUS indexed array you're likely doing something

wrong. Yes there are still places in the main tree who do that,
but most of them need to be fixed.


I agree with you; however, this is in the example code in the 
Documentation directory,  It is not part of the trace code.  The example 
was just meant to be a demonstration of how the interface works.


-Andi



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 1/2] Trace code and documentation (updated)

2007-09-19 Thread David Wilder

Randy Dunlap wrote:

On Wed, 19 Sep 2007 17:20:18 +0100 Christoph Hellwig wrote:


On Wed, Sep 19, 2007 at 07:14:47AM -0700, David Wilder wrote:
I agree with you; however, this is in the example code in the 
Documentation directory,  It is not part of the trace code.  The example 
was just meant to be a demonstration of how the interface works.

So we tell people to write bad code?  Wonderful..

And while we're at it can we please stop the dumb idea to put example
code into Documentation?  If example code doesn't get build during a
make oldconfig it will bitrot real fast and not be useful at all.



That's why they exmaples should not be hidden/embedded in .txt files;
they should be standalone .c files with makefiles etc.

I've built and corrected several of them, but they would be more
likely to be kept up-to-date if they are more available in standalone
files.

and they can be taken out of Documentation/ whenever they go into
util-linux-ng or elsewhere.  Let's get the order correct.

---
~Randy

IMHO keeping example code as standalone files under Documentation/* make 
it easy to build an play with.  I like it better than keeping it on some 
project website where it is even less likely to maintained.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] A kernel tracing interface

2007-09-18 Thread David Wilder

Mathieu Desnoyers wrote:

* Andrew Morton ([EMAIL PROTECTED]) wrote:

On Tue, 18 Sep 2007 09:53:03 -0700 Vara Prasad <[EMAIL PROTECTED]> wrote:

This is part of the effort by the SystemTap team to move pieces of the 
project that are generic to mainline.

Yeah.  It seems to have been reviewed to death.  Is it ready to
be applied yet?


I would just say that this could be seen rather as a driver tracing
interface than a general purpose tracing interface. Therefore, maybe the
name is a bit misleading ? And yes, what is there is good, but it does
not seem to be a replacement for a generic kernel tracer.

My suggestions to turn it into a more suitable interface for a generic
tracer are summarized in this email:
http://lkml.org/lkml/2007/9/15/136

Mathieu


Hi Mathieu
  Sorry I did not comment earlier on your email describing a generic 
tracer.  I believe that "trace" and your generic tracer could complement 
each other nicely.  The trace primitives could be expanded in the future 
to provide many of the features you described.  However, some of the 
features you describe are dependent on the format of the trace data 
itself (for example filtering), thus belong in a layer on top of trace.


Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] A kernel tracing interface

2007-09-18 Thread David Wilder

Andrew Morton wrote:

On Tue, 18 Sep 2007 09:53:03 -0700 Vara Prasad <[EMAIL PROTECTED]> wrote:

This is part of the effort by the SystemTap team to move pieces of the 
project that are generic to mainline.


Yeah.  It seems to have been reviewed to death.  Is it ready to
be applied yet?

I am just finishing up incorporating your review comments.  I hope to 
have a new patch out by end.


Thanks for the support
  Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] A kernel tracing interface

2007-09-18 Thread David Wilder

Andrew Morton wrote:

On Thu, 13 Sep 2007 16:43:09 -0700
David Wilder <[EMAIL PROTECTED]> wrote:


These patches provide a kernel tracing interface called "trace".

The motivation for "trace" is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

History- Versions of this code have been submitted for review under
a couple of different names.  The original submission was called UTT,
it was later re-submitted as GTSC.   Christoph Hellwig commented "The
code looks fine ...but the name is just dumb".  Following Christoph's
advice, I changed the name to simply "Trace".

This patch addresses review comments made by Christoph Hellwig and Mathieu
Desnoyers.  Changes include the addition of a mutex and synchronization
protecting trace state changes (using RCU) and the reduction of the
number of exports.

Patches are against 2.6.23-rc4-mm1

Required patches:
1/2 Trace code and documentation
2/2 Relay reset consumed  (required for trace's "rewind" feature")

Signed-off-by: David Wilder <[EMAIL PROTECTED]>


Well the code looks neat and easy enough to merge.

What exactly is the relationship between this and systemtap and kprobes and
all the other tracing things which people are doing?


The key to the relationship is relay.  Systemtap, kprobes, blktrace 
,(and others) need the fast user-kernel data hose that relay provides. 
Trace is a means to simplify and standardize the use of relay. 
Systemtap adopted this code some time ago in its own runtime.  Moving 
this code into the kernel will allow other tracers to take advantage of 
the same trace primitives.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] A kernel tracing interface

2007-09-18 Thread David Wilder

Andrew Morton wrote:

On Thu, 13 Sep 2007 16:43:09 -0700
David Wilder [EMAIL PROTECTED] wrote:


These patches provide a kernel tracing interface called trace.

The motivation for trace is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

History- Versions of this code have been submitted for review under
a couple of different names.  The original submission was called UTT,
it was later re-submitted as GTSC.   Christoph Hellwig commented The
code looks fine ...but the name is just dumb.  Following Christoph's
advice, I changed the name to simply Trace.

This patch addresses review comments made by Christoph Hellwig and Mathieu
Desnoyers.  Changes include the addition of a mutex and synchronization
protecting trace state changes (using RCU) and the reduction of the
number of exports.

Patches are against 2.6.23-rc4-mm1

Required patches:
1/2 Trace code and documentation
2/2 Relay reset consumed  (required for trace's rewind feature)

Signed-off-by: David Wilder [EMAIL PROTECTED]


Well the code looks neat and easy enough to merge.

What exactly is the relationship between this and systemtap and kprobes and
all the other tracing things which people are doing?


The key to the relationship is relay.  Systemtap, kprobes, blktrace 
,(and others) need the fast user-kernel data hose that relay provides. 
Trace is a means to simplify and standardize the use of relay. 
Systemtap adopted this code some time ago in its own runtime.  Moving 
this code into the kernel will allow other tracers to take advantage of 
the same trace primitives.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] A kernel tracing interface

2007-09-18 Thread David Wilder

Andrew Morton wrote:

On Tue, 18 Sep 2007 09:53:03 -0700 Vara Prasad [EMAIL PROTECTED] wrote:

This is part of the effort by the SystemTap team to move pieces of the 
project that are generic to mainline.


Yeah.  It seems to have been reviewed to death.  Is it ready to
be applied yet?

I am just finishing up incorporating your review comments.  I hope to 
have a new patch out by end.


Thanks for the support
  Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] A kernel tracing interface

2007-09-18 Thread David Wilder

Mathieu Desnoyers wrote:

* Andrew Morton ([EMAIL PROTECTED]) wrote:

On Tue, 18 Sep 2007 09:53:03 -0700 Vara Prasad [EMAIL PROTECTED] wrote:

This is part of the effort by the SystemTap team to move pieces of the 
project that are generic to mainline.

Yeah.  It seems to have been reviewed to death.  Is it ready to
be applied yet?


I would just say that this could be seen rather as a driver tracing
interface than a general purpose tracing interface. Therefore, maybe the
name is a bit misleading ? And yes, what is there is good, but it does
not seem to be a replacement for a generic kernel tracer.

My suggestions to turn it into a more suitable interface for a generic
tracer are summarized in this email:
http://lkml.org/lkml/2007/9/15/136

Mathieu


Hi Mathieu
  Sorry I did not comment earlier on your email describing a generic 
tracer.  I believe that trace and your generic tracer could complement 
each other nicely.  The trace primitives could be expanded in the future 
to provide many of the features you described.  However, some of the 
features you describe are dependent on the format of the trace data 
itself (for example filtering), thus belong in a layer on top of trace.


Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Trace code and documentation

2007-09-14 Thread David Wilder

Sam Ravnborg wrote:

Hi David.

A random comment to the code.
Several of the struct file_operations are not declared static as
they should be.

Btw. it looks good from a coding style point-of-view.

About the name what about ktrace??

Sam

  
Thanks for the comment. I sure don't want to change the name a forth 
time, can we live with "trace"?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Trace code and documentation

2007-09-14 Thread David Wilder

Andrew Morton wrote:


+/*
+ * Based on blktrace code, Copyright (C) 2006 Jens Axboe <[EMAIL PROTECTED]>



So can we migrate blktrace to using this?
  

Yes, a blktrace patch is comming.


+   int ret;
+
+   if (trace->flags & TRACE_DISABLE_STATE)
+   return -EINVAL;
+   
+   if (count > sizeof(buf) - 1)
+   return -EINVAL;
+
+   if (copy_from_user(buf, buffer, count))
+   return -EFAULT;
+
+   buf[count] = '\0';
+   
+   if (strncmp(buf, "start", strlen("start")) == 0 ) {
+   ret = trace_start(trace);
+   if (ret)
+   return ret;
+   } else if (strncmp(buffer, "stop", strlen("stop")) == 0)
+   trace_stop(trace);
+   else
+   return -EINVAL;



What's the above code doing?  Trying to cope with trailing chars after
"start" or "stop"?  Is that actually needed?   It's the \n, I assume?
  


Yes, the typical usage is "echo start > state" and echo adds a \n.


Thanks for the comments, I will make the changes and resubmit.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Trace code and documentation

2007-09-14 Thread David Wilder

Andrew Morton wrote:


+/*
+ * Based on blktrace code, Copyright (C) 2006 Jens Axboe [EMAIL PROTECTED]



So can we migrate blktrace to using this?
  

Yes, a blktrace patch is comming.


+   int ret;
+
+   if (trace-flags  TRACE_DISABLE_STATE)
+   return -EINVAL;
+   
+   if (count  sizeof(buf) - 1)
+   return -EINVAL;
+
+   if (copy_from_user(buf, buffer, count))
+   return -EFAULT;
+
+   buf[count] = '\0';
+   
+   if (strncmp(buf, start, strlen(start)) == 0 ) {
+   ret = trace_start(trace);
+   if (ret)
+   return ret;
+   } else if (strncmp(buffer, stop, strlen(stop)) == 0)
+   trace_stop(trace);
+   else
+   return -EINVAL;



What's the above code doing?  Trying to cope with trailing chars after
start or stop?  Is that actually needed?   It's the \n, I assume?
  


Yes, the typical usage is echo start  state and echo adds a \n.


Thanks for the comments, I will make the changes and resubmit.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Trace code and documentation

2007-09-14 Thread David Wilder

Sam Ravnborg wrote:

Hi David.

A random comment to the code.
Several of the struct file_operations are not declared static as
they should be.

Btw. it looks good from a coding style point-of-view.

About the name what about ktrace??

Sam

  
Thanks for the comment. I sure don't want to change the name a forth 
time, can we live with trace?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Relay reset consumed

2007-09-13 Thread David Wilder


This patch allows relay channels to be reset i.e. unconsumed.
Basically allows a 'rewind' function for flight-recorder tracing.

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/filesystems/relay.txt |   11 ++
 include/linux/relay.h   |1 +
 kernel/relay.c  |   58 ---
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt
index 18d23f9..d31113a 100644
--- a/Documentation/filesystems/relay.txt
+++ b/Documentation/filesystems/relay.txt
@@ -161,6 +161,7 @@ TBD(curr. line MT:/API/)
 relay_close(chan)
 relay_flush(chan)
 relay_reset(chan)
+relay_reset_consumed(chan)
 
   channel management typically called on instigation of userspace:
 
@@ -452,6 +453,16 @@ state without reallocating channel buffer memory or destroying
 existing mappings.  It should however only be called when it's safe to
 do so, i.e. when the channel isn't currently being written to.
 
+The read(2) implementation always 'consumes' the bytes read,
+i.e. those bytes won't be available again to subsequent reads.
+Certain applications may nonetheless wish to allow the 'consumed' data
+to be re-read; relay_reset_consumed() is provided for that purpose -
+it resets the internal consumed counters for all buffers in the
+channel.  For example, if a first set of reads 'drains' the channel,
+and then relay_reset_consumed() is called, a second set of reads will
+get the exact same data (assuming no new data was written between the
+first set of reads and the second).
+
 Finally, there are a couple of utility callbacks that can be used for
 different purposes.  buf_mapped() is called whenever a channel buffer
 is mmapped from user space and buf_unmapped() is called when it's
diff --git a/include/linux/relay.h b/include/linux/relay.h
index 6cd8c44..aca45fa 100644
--- a/include/linux/relay.h
+++ b/include/linux/relay.h
@@ -175,6 +175,7 @@ extern void relay_subbufs_consumed(struct rchan *chan,
    unsigned int cpu,
    size_t consumed);
 extern void relay_reset(struct rchan *chan);
+extern void relay_reset_consumed(struct rchan *chan);
 extern int relay_buf_full(struct rchan_buf *buf);
 
 extern size_t relay_switch_subbuf(struct rchan_buf *buf,
diff --git a/kernel/relay.c b/kernel/relay.c
index 61134eb..106ce92 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -383,6 +383,57 @@ void relay_reset(struct rchan *chan)
 }
 EXPORT_SYMBOL_GPL(relay_reset);
 
+/**
+ *	__relay_reset_consumed - reset a channel buffer's consumed count
+ *	@buf: the channel buffer
+ *
+ *	See relay_reset_consumed for description of effect.
+ */
+static inline void __relay_reset_consumed(struct rchan_buf *buf)
+{
+	size_t n_subbufs = buf->chan->n_subbufs;
+	size_t produced = buf->subbufs_produced;
+	size_t consumed = buf->subbufs_consumed;
+
+	if (produced < n_subbufs)
+		buf->subbufs_consumed = 0;
+	else {
+		consumed = produced - n_subbufs;
+		if (buf->offset)
+			consumed++;
+		buf->subbufs_consumed = consumed;
+	}
+	buf->bytes_consumed = 0;
+}
+
+/**
+ *	relay_reset_consumed - reset the channel's consumed counts
+ *	@chan: the channel
+ *
+ *	This has the effect of making all data previously read (and
+ *	not overwritten by subsequent writes) from a channel available
+ *	for reading again.
+ *
+ *	NOTE: Care should be taken that the channel isn't actually
+ *	being used by anything when this call is made.
+ */
+void relay_reset_consumed(struct rchan *chan)
+{
+	unsigned int i;
+	struct rchan_buf *prev = NULL;
+
+	if (!chan)
+		return;
+
+	for (i = 0; i < NR_CPUS; i++) {
+		if (!chan->buf[i] || chan->buf[i] == prev)
+			break;
+		__relay_reset_consumed(chan->buf[i]);
+		prev = chan->buf[i];
+	}
+}
+EXPORT_SYMBOL_GPL(relay_reset_consumed);
+
 /*
  *	relay_open_buf - create a new relay channel buffer
  *
@@ -845,11 +896,8 @@ static int relay_file_read_avail(struct rchan_buf *buf, size_t read_pos)
 		return 1;
 	}
 
-	if (unlikely(produced - consumed >= n_subbufs)) {
-		consumed = produced - n_subbufs + 1;
-		buf->subbufs_consumed = consumed;
-		buf->bytes_consumed = 0;
-	}
+	if (unlikely(produced - consumed >= n_subbufs)) 
+		__relay_reset_consumed(buf);
 
 	produced = (produced % n_subbufs) * subbuf_size + buf->offset;
 	consumed = (consumed % n_subbufs) * subbuf_size + buf->bytes_consumed;


[PATCH 1/2] Trace code and documentation

2007-09-13 Thread David Wilder


Trace - Provides tracing primitives

Tom Zanussi <[EMAIL PROTECTED]>
Martin Hunt <[EMAIL PROTECTED]>
David Wilder <[EMAIL PROTECTED]>

---
 Documentation/trace.txt |  297 
 include/linux/trace.h   |   99 
 lib/Kconfig |   10 +
 lib/Makefile|2 +
 lib/trace.c |  575 +++
 5 files changed, 983 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace.txt b/Documentation/trace.txt
new file mode 100644
index 000..57a5c71
--- /dev/null
+++ b/Documentation/trace.txt
@@ -0,0 +1,297 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.  The
+trace interface builds on the relay interface.  For a complete
+description of the relay interface, please see:
+Documentation/filesystems/relay.txt.
+
+The trace interface provides a single layer in a complete tracing
+application.  Trace provides a kernel API that can be used for the setup
+and control of tracing channels.  User of trace must provide a data layer
+responsible for formatting and writing data into the trace channels.
+
+A layered approach to tracing
+=
+A complete kernel tracing application consists of a data provider and
+a data consumer.  Both provider and consumer contain three layers; each
+layer works in tandem with the corresponding layer in the opposite side.
+The layers are represented in the following diagram.
+
+Provider Data layer
+	Formats raw trace data and provides data-related service.
+	For example, adding timestamps used by consumer to sort data.
+
+Provider Control layer
+	Provided by the trace interface, this layer creates trace channels
+	and informs the data layer and consumer of the current state
+	of the trace channels.
+
+Provider Buffering layer
+	Provided by relay. This layer buffers data in the
+	kernel for consumption by the consumer's buffer
+	layer.
+
+Provider (in-kernel facility)
+-
+Consumer (user application)
+
+
+Consumer Buffer layer
+	Reads/consumes data from the provider's data buffers.
+
+Consumer Control layer
+	Communicates to the provider's control layer to control the state
+	of the trace channels.
+
+Consumer Data layer
+	Sorts and formats data as provided by the provider's data layer.
+
+The provider is coded as a kernel facility.  The consumer is coded as
+a user application.
+
+
+Trace - Features
+
+Trace exploits services and features provided by relay.  These features
+are:
+- The creation and destruction of relay channels.
+- Buffer management.  Overwrite or non-overwrite modes can be selected
+  as well as global or per-CPU buffering.
+
+Overwrite mode can be called "flight recorder mode".  Flight recorder
+mode is selected by setting the TRACE_FLIGHT_CHANNEL flag when
+creating trace channels.  In flight mode when a tracing buffer is
+full, the oldest records in the buffer will be discarded to make room
+as new records arrive.	In the default non-overwrite mode, new records
+may be written only if the buffer has room.  In either case, to
+prevent data loss, a user space reader must keep the buffers
+drained. Trace provides a means to detect the number of records that
+have been dropped due to a buffer-full condition (non-overwrite mode
+only).
+
+When per-CPU buffers are used, relay creates one debugfs file for each
+running CPU.  The user-space consumer of the data is responsible for
+reading the per-CPU buffers and collating the records presumably using
+a time stamp or sequence number included in the trace records.	The
+use of global buffers eliminates this extra work of sequencing
+records; however the provider's data layer must hold a lock when
+writing records.  The lock prevents writers running on different CPUs
+from overwriting each other's data.  However, buffering may be slower
+because writes to the buffer are serialized. Global buffering is
+selected by setting the TRACE_GLOBAL_CHANNEL flag when creating trace
+channels.
+
+Trace User Interface
+===
+When a trace channel is created and started, the following
+directories and files are created in the root of the mounted debugfs.
+
+/debug (root of the debugfs)
+	/
+		/
+			trace0 ... traceN  Per-CPU trace data, one
+	   file per CPU.
+
+			state		   Start or stop tracing by
+	   by writing the strings
+	   "start" or "stop" to this
+	   file. Read the file to get the
+	   current state.
+
+			dropped		   The number of records dropped
+	   due to a full-buffer condition,
+	   for non-TRACE_FLIGHT_CHANNELs
+	   only.
+
+			rewind		   Trigger a rewind by writing
+	   to this file.  i.e. start
+	   next read at the beginning
+	   again. Only available for
+	   TRACE_F

[PATCH 0/2] A kernel tracing interface

2007-09-13 Thread David Wilder

These patches provide a kernel tracing interface called "trace".

The motivation for "trace" is to:
- Provide a simple set of tracing primitives that will utilize the high-
 performance and low-overhead of relayfs for passing traces data from
 kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
 useful to others.

History- Versions of this code have been submitted for review under
a couple of different names.  The original submission was called UTT,
it was later re-submitted as GTSC.   Christoph Hellwig commented "The
code looks fine ...but the name is just dumb".  Following Christoph's
advice, I changed the name to simply "Trace".

This patch addresses review comments made by Christoph Hellwig and Mathieu
Desnoyers.  Changes include the addition of a mutex and synchronization
protecting trace state changes (using RCU) and the reduction of the
number of exports.

Patches are against 2.6.23-rc4-mm1

Required patches:
1/2 Trace code and documentation
2/2 Relay reset consumed  (required for trace's "rewind" feature")

Signed-off-by: David Wilder <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Relay reset consumed

2007-09-13 Thread David Wilder


This patch allows relay channels to be reset i.e. unconsumed.
Basically allows a 'rewind' function for flight-recorder tracing.

Signed-off-by: Tom Zanussi [EMAIL PROTECTED]
Signed-off-by: David Wilder [EMAIL PROTECTED]
---
 Documentation/filesystems/relay.txt |   11 ++
 include/linux/relay.h   |1 +
 kernel/relay.c  |   58 ---
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt
index 18d23f9..d31113a 100644
--- a/Documentation/filesystems/relay.txt
+++ b/Documentation/filesystems/relay.txt
@@ -161,6 +161,7 @@ TBD(curr. line MT:/API/)
 relay_close(chan)
 relay_flush(chan)
 relay_reset(chan)
+relay_reset_consumed(chan)
 
   channel management typically called on instigation of userspace:
 
@@ -452,6 +453,16 @@ state without reallocating channel buffer memory or destroying
 existing mappings.  It should however only be called when it's safe to
 do so, i.e. when the channel isn't currently being written to.
 
+The read(2) implementation always 'consumes' the bytes read,
+i.e. those bytes won't be available again to subsequent reads.
+Certain applications may nonetheless wish to allow the 'consumed' data
+to be re-read; relay_reset_consumed() is provided for that purpose -
+it resets the internal consumed counters for all buffers in the
+channel.  For example, if a first set of reads 'drains' the channel,
+and then relay_reset_consumed() is called, a second set of reads will
+get the exact same data (assuming no new data was written between the
+first set of reads and the second).
+
 Finally, there are a couple of utility callbacks that can be used for
 different purposes.  buf_mapped() is called whenever a channel buffer
 is mmapped from user space and buf_unmapped() is called when it's
diff --git a/include/linux/relay.h b/include/linux/relay.h
index 6cd8c44..aca45fa 100644
--- a/include/linux/relay.h
+++ b/include/linux/relay.h
@@ -175,6 +175,7 @@ extern void relay_subbufs_consumed(struct rchan *chan,
    unsigned int cpu,
    size_t consumed);
 extern void relay_reset(struct rchan *chan);
+extern void relay_reset_consumed(struct rchan *chan);
 extern int relay_buf_full(struct rchan_buf *buf);
 
 extern size_t relay_switch_subbuf(struct rchan_buf *buf,
diff --git a/kernel/relay.c b/kernel/relay.c
index 61134eb..106ce92 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -383,6 +383,57 @@ void relay_reset(struct rchan *chan)
 }
 EXPORT_SYMBOL_GPL(relay_reset);
 
+/**
+ *	__relay_reset_consumed - reset a channel buffer's consumed count
+ *	@buf: the channel buffer
+ *
+ *	See relay_reset_consumed for description of effect.
+ */
+static inline void __relay_reset_consumed(struct rchan_buf *buf)
+{
+	size_t n_subbufs = buf-chan-n_subbufs;
+	size_t produced = buf-subbufs_produced;
+	size_t consumed = buf-subbufs_consumed;
+
+	if (produced  n_subbufs)
+		buf-subbufs_consumed = 0;
+	else {
+		consumed = produced - n_subbufs;
+		if (buf-offset)
+			consumed++;
+		buf-subbufs_consumed = consumed;
+	}
+	buf-bytes_consumed = 0;
+}
+
+/**
+ *	relay_reset_consumed - reset the channel's consumed counts
+ *	@chan: the channel
+ *
+ *	This has the effect of making all data previously read (and
+ *	not overwritten by subsequent writes) from a channel available
+ *	for reading again.
+ *
+ *	NOTE: Care should be taken that the channel isn't actually
+ *	being used by anything when this call is made.
+ */
+void relay_reset_consumed(struct rchan *chan)
+{
+	unsigned int i;
+	struct rchan_buf *prev = NULL;
+
+	if (!chan)
+		return;
+
+	for (i = 0; i  NR_CPUS; i++) {
+		if (!chan-buf[i] || chan-buf[i] == prev)
+			break;
+		__relay_reset_consumed(chan-buf[i]);
+		prev = chan-buf[i];
+	}
+}
+EXPORT_SYMBOL_GPL(relay_reset_consumed);
+
 /*
  *	relay_open_buf - create a new relay channel buffer
  *
@@ -845,11 +896,8 @@ static int relay_file_read_avail(struct rchan_buf *buf, size_t read_pos)
 		return 1;
 	}
 
-	if (unlikely(produced - consumed = n_subbufs)) {
-		consumed = produced - n_subbufs + 1;
-		buf-subbufs_consumed = consumed;
-		buf-bytes_consumed = 0;
-	}
+	if (unlikely(produced - consumed = n_subbufs)) 
+		__relay_reset_consumed(buf);
 
 	produced = (produced % n_subbufs) * subbuf_size + buf-offset;
 	consumed = (consumed % n_subbufs) * subbuf_size + buf-bytes_consumed;


[PATCH 0/2] A kernel tracing interface

2007-09-13 Thread David Wilder

These patches provide a kernel tracing interface called trace.

The motivation for trace is to:
- Provide a simple set of tracing primitives that will utilize the high-
 performance and low-overhead of relayfs for passing traces data from
 kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
 useful to others.

History- Versions of this code have been submitted for review under
a couple of different names.  The original submission was called UTT,
it was later re-submitted as GTSC.   Christoph Hellwig commented The
code looks fine ...but the name is just dumb.  Following Christoph's
advice, I changed the name to simply Trace.

This patch addresses review comments made by Christoph Hellwig and Mathieu
Desnoyers.  Changes include the addition of a mutex and synchronization
protecting trace state changes (using RCU) and the reduction of the
number of exports.

Patches are against 2.6.23-rc4-mm1

Required patches:
1/2 Trace code and documentation
2/2 Relay reset consumed  (required for trace's rewind feature)

Signed-off-by: David Wilder [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Trace code and documentation

2007-09-13 Thread David Wilder


Trace - Provides tracing primitives

Tom Zanussi [EMAIL PROTECTED]
Martin Hunt [EMAIL PROTECTED]
David Wilder [EMAIL PROTECTED]

---
 Documentation/trace.txt |  297 
 include/linux/trace.h   |   99 
 lib/Kconfig |   10 +
 lib/Makefile|2 +
 lib/trace.c |  575 +++
 5 files changed, 983 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace.txt b/Documentation/trace.txt
new file mode 100644
index 000..57a5c71
--- /dev/null
+++ b/Documentation/trace.txt
@@ -0,0 +1,297 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.  The
+trace interface builds on the relay interface.  For a complete
+description of the relay interface, please see:
+Documentation/filesystems/relay.txt.
+
+The trace interface provides a single layer in a complete tracing
+application.  Trace provides a kernel API that can be used for the setup
+and control of tracing channels.  User of trace must provide a data layer
+responsible for formatting and writing data into the trace channels.
+
+A layered approach to tracing
+=
+A complete kernel tracing application consists of a data provider and
+a data consumer.  Both provider and consumer contain three layers; each
+layer works in tandem with the corresponding layer in the opposite side.
+The layers are represented in the following diagram.
+
+Provider Data layer
+	Formats raw trace data and provides data-related service.
+	For example, adding timestamps used by consumer to sort data.
+
+Provider Control layer
+	Provided by the trace interface, this layer creates trace channels
+	and informs the data layer and consumer of the current state
+	of the trace channels.
+
+Provider Buffering layer
+	Provided by relay. This layer buffers data in the
+	kernel for consumption by the consumer's buffer
+	layer.
+
+Provider (in-kernel facility)
+-
+Consumer (user application)
+
+
+Consumer Buffer layer
+	Reads/consumes data from the provider's data buffers.
+
+Consumer Control layer
+	Communicates to the provider's control layer to control the state
+	of the trace channels.
+
+Consumer Data layer
+	Sorts and formats data as provided by the provider's data layer.
+
+The provider is coded as a kernel facility.  The consumer is coded as
+a user application.
+
+
+Trace - Features
+
+Trace exploits services and features provided by relay.  These features
+are:
+- The creation and destruction of relay channels.
+- Buffer management.  Overwrite or non-overwrite modes can be selected
+  as well as global or per-CPU buffering.
+
+Overwrite mode can be called flight recorder mode.  Flight recorder
+mode is selected by setting the TRACE_FLIGHT_CHANNEL flag when
+creating trace channels.  In flight mode when a tracing buffer is
+full, the oldest records in the buffer will be discarded to make room
+as new records arrive.	In the default non-overwrite mode, new records
+may be written only if the buffer has room.  In either case, to
+prevent data loss, a user space reader must keep the buffers
+drained. Trace provides a means to detect the number of records that
+have been dropped due to a buffer-full condition (non-overwrite mode
+only).
+
+When per-CPU buffers are used, relay creates one debugfs file for each
+running CPU.  The user-space consumer of the data is responsible for
+reading the per-CPU buffers and collating the records presumably using
+a time stamp or sequence number included in the trace records.	The
+use of global buffers eliminates this extra work of sequencing
+records; however the provider's data layer must hold a lock when
+writing records.  The lock prevents writers running on different CPUs
+from overwriting each other's data.  However, buffering may be slower
+because writes to the buffer are serialized. Global buffering is
+selected by setting the TRACE_GLOBAL_CHANNEL flag when creating trace
+channels.
+
+Trace User Interface
+===
+When a trace channel is created and started, the following
+directories and files are created in the root of the mounted debugfs.
+
+/debug (root of the debugfs)
+	/trace-root-dir
+		/trace-name
+			trace0 ... traceN  Per-CPU trace data, one
+	   file per CPU.
+
+			state		   Start or stop tracing by
+	   by writing the strings
+	   start or stop to this
+	   file. Read the file to get the
+	   current state.
+
+			dropped		   The number of records dropped
+	   due to a full-buffer condition,
+	   for non-TRACE_FLIGHT_CHANNELs
+	   only.
+
+			rewind		   Trigger a rewind by writing
+	   to this file.  i.e. start
+	   next read at the beginning
+	   again. Only available for
+	   TRACE_FLIGHT_CHANNELS.
+
+
+			nr_sub		   Number

[patch] s390 kprobe fix instruction length calculation

2007-08-15 Thread David Wilder

Placing a kprobe on "bc" instruction (s390/s390x) can cause an oops.
The instruction length is encoded into the first two bits of the s390 
instruction.  Kprobe is incorrectly computing the instruction length.
The instruction length is used for determining what type of "fix-up" is 
needed for conditional branch instruction.  The problem can bee seen by 
placing a kprobe on a  "bc" instruction that will not branch.   The 
results is that  Kprobe incorrectly computes the  new instruction 
pointer (psw.addr) after single stepping the instruction.   The problem 
is corrected with this patch.




 arch/s390/kernel/kprobes.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index 358d2bb..e40373d 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -85,7 +85,7 @@ void __kprobes get_instruction_type(struct arch_specific_insn *ainsn)
 	ainsn->reg = (*ainsn->insn & 0xf0) >> 4;
 
 	/* save the instruction length (pop 5-5) in bytes */
-	switch (*(__u8 *) (ainsn->insn) >> 4) {
+	switch (*(__u8 *) (ainsn->insn) >> 6) {
 	case 0:
 		ainsn->ilen = 2;
 		break;


[patch] s390 kprobe fix instruction length calculation

2007-08-15 Thread David Wilder

Placing a kprobe on bc instruction (s390/s390x) can cause an oops.
The instruction length is encoded into the first two bits of the s390 
instruction.  Kprobe is incorrectly computing the instruction length.
The instruction length is used for determining what type of fix-up is 
needed for conditional branch instruction.  The problem can bee seen by 
placing a kprobe on a  bc instruction that will not branch.   The 
results is that  Kprobe incorrectly computes the  new instruction 
pointer (psw.addr) after single stepping the instruction.   The problem 
is corrected with this patch.




 arch/s390/kernel/kprobes.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index 358d2bb..e40373d 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -85,7 +85,7 @@ void __kprobes get_instruction_type(struct arch_specific_insn *ainsn)
 	ainsn-reg = (*ainsn-insn  0xf0)  4;
 
 	/* save the instruction length (pop 5-5) in bytes */
-	switch (*(__u8 *) (ainsn-insn)  4) {
+	switch (*(__u8 *) (ainsn-insn)  6) {
 	case 0:
 		ainsn-ilen = 2;
 		break;


Re: [RFC] Generic Trace Setup and Control (GTSC) kernel API (2/3)

2007-06-21 Thread David Wilder

I forgot to CC the list in my response to Alexey.

I plan to address Alexey's concerns in a couple of days (as soon as I 
get past the OLS push).


Alexey Dobriyan wrote:


Can we get another user to justify this generalizing?


 


Systemtap has plans to use the GTSC also.

--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relay-file-read-start-pos-fix.patch

2007-06-21 Thread David Wilder

Ack
Works for me.  Thanks.

Note:
Both Masami's patch and the relay-file-read-start-pos-fix.patch I posted 
earlier are required.


Masami Hiramatsu wrote:


Hi Tom,

Tom Zanussi wrote:
 


Could you send more info on how to reproduce the problem you're seeing?
And does this patch fix it?
   



Sure, I'll explain how to reproduce it.

Since current SystemTap is not supporting "overwrite" mode,
you need to apply a patch before trying to reproduce it.
I already posted the patch to bugzilla. You can get it from below.
http://sourceware.org/bugzilla/attachment.cgi?id=1896=view

Here is an example script (fillup.stp).

global counter=0
probe timer.ms(1) {
   counter++;
   printf("%08d : %020d\n", counter, gettimeofday_ns());
}


First of all, run the script with -O (overwrite mode) flag.
(For simplify my explanation, I also use -m flag here.)
$ stap -O fillup.stp -m fillup
Soon after starting, press ^\(Ctrl+\) to detach from it.

The script writes 32 bytes dummy data per 1 milli-second, so
it writes about 32k bytes per 1 second.
And the default size of relay channel of systemtap is 512kB
which contains 4 subbufs (each size of subbufs is 128kB).
Thus, it fills the relay channel at about 16 seconds and
wraparounds because it uses overwrite mode.

So, wait more than 16 seconds (for example, 18 sec),
read the relay channel and count the line number.
And repeat it.
$ while true; do sleep 18; \
cat /sys/kernel/debug/systemtap/fillup/trace0 | wc -l; done

Ideally, it will show the number from 12288(=3*128k/32) to
16384(=4*128k/32).

However, without my patch, it shows;
4793
5721
9818
9817
9819
13912
0
0
780
1625
5723
5721

And, with my patch;
15742
13273
14901
12430
14056
15682
13215
14840
12370
13996
15624
13154


So, I think my patch (which )fixes the problem.

Thanks,

P.S.
I attached my patch (relay-file-read-overwrite-mode-fix.patch)
which fixed the problem pointed in previous mail.

Signed-off-by: Masami Hiramatsu <[EMAIL PROTECTED]>

---
kernel/relay.c |8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc4-mm2/kernel/relay.c
===
--- linux-2.6.22-rc4-mm2.orig/kernel/relay.c2007-06-13 20:22:02.0 
+0900
+++ linux-2.6.22-rc4-mm2/kernel/relay.c 2007-06-20 10:53:06.0 +0900
@@ -812,7 +812,10 @@
}

buf->bytes_consumed += bytes_consumed;
-   read_subbuf = read_pos / buf->chan->subbuf_size;
+   if (!read_pos)
+   read_subbuf = buf->subbufs_consumed % n_subbufs;
+   else
+   read_subbuf = read_pos / buf->chan->subbuf_size;
if (buf->bytes_consumed + buf->padding[read_subbuf] == subbuf_size) {
if ((read_subbuf == buf->subbufs_produced % n_subbufs) &&
(buf->offset == subbuf_size))
@@ -841,8 +844,9 @@
}

if (unlikely(produced - consumed >= n_subbufs)) {
-   consumed = (produced / n_subbufs) * n_subbufs;
+   consumed = produced - n_subbufs + 1;
buf->subbufs_consumed = consumed;
+   buf->bytes_consumed = 0;
}

produced = (produced % n_subbufs) * subbuf_size + buf->offset;





 




--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relay-file-read-start-pos-fix.patch

2007-06-21 Thread David Wilder

Ack
Works for me.  Thanks.

Note:
Both Masami's patch and the relay-file-read-start-pos-fix.patch I posted 
earlier are required.


Masami Hiramatsu wrote:


Hi Tom,

Tom Zanussi wrote:
 


Could you send more info on how to reproduce the problem you're seeing?
And does this patch fix it?
   



Sure, I'll explain how to reproduce it.

Since current SystemTap is not supporting overwrite mode,
you need to apply a patch before trying to reproduce it.
I already posted the patch to bugzilla. You can get it from below.
http://sourceware.org/bugzilla/attachment.cgi?id=1896action=view

Here is an example script (fillup.stp).

global counter=0
probe timer.ms(1) {
   counter++;
   printf(%08d : %020d\n, counter, gettimeofday_ns());
}


First of all, run the script with -O (overwrite mode) flag.
(For simplify my explanation, I also use -m flag here.)
$ stap -O fillup.stp -m fillup
Soon after starting, press ^\(Ctrl+\) to detach from it.

The script writes 32 bytes dummy data per 1 milli-second, so
it writes about 32k bytes per 1 second.
And the default size of relay channel of systemtap is 512kB
which contains 4 subbufs (each size of subbufs is 128kB).
Thus, it fills the relay channel at about 16 seconds and
wraparounds because it uses overwrite mode.

So, wait more than 16 seconds (for example, 18 sec),
read the relay channel and count the line number.
And repeat it.
$ while true; do sleep 18; \
cat /sys/kernel/debug/systemtap/fillup/trace0 | wc -l; done

Ideally, it will show the number from 12288(=3*128k/32) to
16384(=4*128k/32).

However, without my patch, it shows;
4793
5721
9818
9817
9819
13912
0
0
780
1625
5723
5721

And, with my patch;
15742
13273
14901
12430
14056
15682
13215
14840
12370
13996
15624
13154


So, I think my patch (which )fixes the problem.

Thanks,

P.S.
I attached my patch (relay-file-read-overwrite-mode-fix.patch)
which fixed the problem pointed in previous mail.

Signed-off-by: Masami Hiramatsu [EMAIL PROTECTED]

---
kernel/relay.c |8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc4-mm2/kernel/relay.c
===
--- linux-2.6.22-rc4-mm2.orig/kernel/relay.c2007-06-13 20:22:02.0 
+0900
+++ linux-2.6.22-rc4-mm2/kernel/relay.c 2007-06-20 10:53:06.0 +0900
@@ -812,7 +812,10 @@
}

buf-bytes_consumed += bytes_consumed;
-   read_subbuf = read_pos / buf-chan-subbuf_size;
+   if (!read_pos)
+   read_subbuf = buf-subbufs_consumed % n_subbufs;
+   else
+   read_subbuf = read_pos / buf-chan-subbuf_size;
if (buf-bytes_consumed + buf-padding[read_subbuf] == subbuf_size) {
if ((read_subbuf == buf-subbufs_produced % n_subbufs) 
(buf-offset == subbuf_size))
@@ -841,8 +844,9 @@
}

if (unlikely(produced - consumed = n_subbufs)) {
-   consumed = (produced / n_subbufs) * n_subbufs;
+   consumed = produced - n_subbufs + 1;
buf-subbufs_consumed = consumed;
+   buf-bytes_consumed = 0;
}

produced = (produced % n_subbufs) * subbuf_size + buf-offset;





 




--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Generic Trace Setup and Control (GTSC) kernel API (2/3)

2007-06-21 Thread David Wilder

I forgot to CC the list in my response to Alexey.

I plan to address Alexey's concerns in a couple of days (as soon as I 
get past the OLS push).


Alexey Dobriyan wrote:


Can we get another user to justify this generalizing?


 


Systemtap has plans to use the GTSC also.

--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] relay-file-read-start-pos-fix.patch

2007-06-17 Thread David Wilder


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch fixes a bug in the relay read interface causing the number
of consumed bytes to be set incorrectly. 

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>

 kernel/relay.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/relay.c b/kernel/relay.c
index 4311101..e61156e 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -899,7 +899,10 @@ static size_t relay_file_read_start_pos(size_t read_pos,
 	size_t read_subbuf, padding, padding_start, padding_end;
 	size_t subbuf_size = buf->chan->subbuf_size;
 	size_t n_subbufs = buf->chan->n_subbufs;
+	size_t consumed = buf->subbufs_consumed % n_subbufs;
 
+	if (!read_pos)
+		read_pos = consumed * subbuf_size + buf->bytes_consumed;
 	read_subbuf = read_pos / subbuf_size;
 	padding = buf->padding[read_subbuf];
 	padding_start = (read_subbuf + 1) * subbuf_size - padding;


[RFC] Generic Trace Setup and Control (GTSC) kernel API (2/3)

2007-06-17 Thread David Wilder


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch introduces the Generic Trace Setup and Control (GTSC) API.
In the kernel, GTSC provides a simple API for starting and managing
data channels to user space.  GTSC builds on the relay interface.
The documentation for the GTSC is provided in a separate patch.

Signed-off-by: David Wilder <[EMAIL PROTECTED]>

 include/linux/gtsc.h |  124 
 lib/Kconfig  |9 ++
 lib/Makefile |2 +
 lib/gtsc.c   |  385 ++
 4 files changed, 520 insertions(+), 0 deletions(-)

diff --git a/include/linux/gtsc.h b/include/linux/gtsc.h
new file mode 100644
index 000..224aa16
--- /dev/null
+++ b/include/linux/gtsc.h
@@ -0,0 +1,124 @@
+/*
+ * gtsc.h - GTSC defines and function prototypes
+ *
+ * Copyright (C) 2006 IBM Inc.
+ *
+ *	Tom Zanussi <[EMAIL PROTECTED]>
+ *	Martin Hunt <[EMAIL PROTECTED]>
+ *	David Wilder <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ */
+#ifndef _LINUX_GTSC_H
+#define _LINUX_GTSC_H
+
+#include 
+
+/*
+ * GTSC channel flags
+ */
+#define GTSC_GLOBAL	0x01
+#define GTSC_FLIGHT	0x02
+
+enum {
+	Gtsc_trace_setup = 1,
+	Gtsc_trace_running,
+	Gtsc_trace_stopped,
+};
+
+#define GTSC_TRACE_ROOT_NAME_SIZE	64	/* Max root dir identifier */
+#define GTSC_TRACE_NAME_SIZE		64	/* Max trace identifier */
+
+/*
+ * Global root user information
+ */
+struct gtsc_root {
+	struct list_head list;
+	char gtsc_name[GTSC_TRACE_ROOT_NAME_SIZE];
+	struct dentry *gtsc_root;
+	unsigned int gtsc_users;
+};
+
+/*
+ * Client information
+ */
+struct gtsc_trace {
+	int trace_state;
+	struct dentry *state_file;
+	struct rchan *rchan;
+	struct dentry *dir;
+	struct dentry *dropped_file;
+	atomic_t dropped;
+	struct gtsc_root *root;
+	void *private_data;
+	unsigned int flags;
+	unsigned int buf_size;
+	unsigned int buf_nr;
+};
+
+static inline int gtsc_trace_running(struct gtsc_trace *gtsc)
+{
+	return gtsc->trace_state == Gtsc_trace_running;
+}
+
+#if defined(CONFIG_GTSC)
+
+/**
+ *	gtsc_trace_setup: create a new gtsc trace handle
+ *
+ *	@root: The root directory name in the root of the debugfs
+ *	   to place trace directories. Created as needed.
+ *	@name: Trace directory name, created in @root
+ *	@buf_size: size of the relay sub-buffers
+ *	@buf_nr: number of relay sub-buffers
+ *	@flags: Option selection (see GTSC channel flags definitions)
+ *		default values when flags=0 are: use per-CPU buffering,
+ *		use non-overwrite mode. See Documentation/gtsc.txt for details.
+ *
+ *	returns a gtsc_trace handle or NULL, if setup failed.
+ */
+extern struct gtsc_trace *gtsc_trace_setup(char *root, char *name, u32 buf_size,
+	 u32 buf_nr, u32 flags);
+
+/**
+ *	gtsc_trace_startstop: start or stop tracing.
+ *
+ *	@gtsc: gtsc trace handle to start or stop.
+ *	@start: set to 1 to start tracing set to 0 to stop.
+ *
+ *	returns 0 if successful.
+ */
+extern int gtsc_trace_startstop(struct gtsc_trace *gtsc, int start);
+
+/**
+ *	gtsc_trace_cleanup: destroys the gtsc channel.
+ *
+ *	@gtsc: gtsc trace handle to cleanup
+ */
+extern void gtsc_trace_cleanup(struct gtsc_trace *gtsc);
+
+/**
+ *	gtsc_timestamp: returns a time stamp.
+ */
+extern unsigned long long  gtsc_timestamp(void);
+
+#else /* !CONFIG_GTSC */
+#define gtsc_trace_setup(root, name, buf_size, buf_nr, flags)	(NULL)
+#define gtsc_trace_startstop(gtsc, start)	(-EINVAL)
+#define gtsc_trace_cleanup(gtsc)		do { } while (0)
+#define gtsc_timestamp(void) 			(unsigned long long) (0)
+#endif /* CONFIG_GTSC */
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 2e7ae6b..d6e048f 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -124,4 +124,13 @@ config HAS_DMA
 	depends on !NO_DMA
 	default y
 
+config GTSC
+	bool "Generic Trace Setup and Control"
+	select RELAY
+	select DEBUG_FS
+	help
+	This option enables support for the GTSC. 
+
+	If unsure, say N.
+
 endmenu
diff --git a/lib/Makefile b/lib/Makefile
index c8c8e20..dcbdb5e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,6 +62,8 @@ obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
+obj-$(CONFIG_GTSC) += gtsc.o
+
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
diff --git a/lib/gtsc.c b/lib/gtsc.c
new fi

[RFC] Generic Trace Setup and Control (GTSC) kernel API (3/3)

2007-06-17 Thread David Wilder

Patches to convert blktrace to the new GTSC API.
Two patches are included, the first is to the kernel portion of 
blktrace.  Apply the second patch is to the blktrace user code.


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch converts the blktrace facility to use the Generic Trace
Setup and Control (GTSC) API.  (kernel patch)

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>

diff --git a/block/Kconfig b/block/Kconfig
index a50f481..9ae9a8c 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -30,7 +30,7 @@ config LBD
 config BLK_DEV_IO_TRACE
 	bool "Support for tracing block io actions"
 	depends on SYSFS
-	select RELAY
+	select GTSC
 	select DEBUG_FS
 	help
 	  Say Y here, if you want to be able to trace the block layer actions
diff --git a/block/blktrace.c b/block/blktrace.c
index 3f0e7c3..b4acf89 100644
--- a/block/blktrace.c
+++ b/block/blktrace.c
@@ -36,7 +36,7 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
 {
 	struct blk_io_trace *t;

-	t = relay_reserve(bt->rchan, sizeof(*t) + len);
+	t = relay_reserve(bt->gtsc->rchan, sizeof(*t) + len);
 	if (t) {
 		const int cpu = smp_processor_id();

@@ -126,7 +126,7 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 	pid_t pid;
 	int cpu;

-	if (unlikely(bt->trace_state != Blktrace_running))
+	if (unlikely(!gtsc_trace_running(bt->gtsc)))
 		return;

 	what |= ddir_act[rw & WRITE];
@@ -152,7 +152,7 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 	if (unlikely(tsk->btrace_seq != blktrace_seq))
 		trace_note_tsk(bt, tsk);

-	t = relay_reserve(bt->rchan, sizeof(*t) + pdu_len);
+	t = relay_reserve(bt->gtsc->rchan, sizeof(*t) + pdu_len);
 	if (t) {
 		cpu = smp_processor_id();
 		sequence = per_cpu_ptr(bt->sequence, cpu);
@@ -178,55 +178,8 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,

 EXPORT_SYMBOL_GPL(__blk_add_trace);

-static struct dentry *blk_tree_root;
-static struct mutex blk_tree_mutex;
-static unsigned int root_users;
-
-static inline void blk_remove_root(void)
-{
-	if (blk_tree_root) {
-		debugfs_remove(blk_tree_root);
-		blk_tree_root = NULL;
-	}
-}
-
-static void blk_remove_tree(struct dentry *dir)
-{
-	mutex_lock(_tree_mutex);
-	debugfs_remove(dir);
-	if (--root_users == 0)
-		blk_remove_root();
-	mutex_unlock(_tree_mutex);
-}
-
-static struct dentry *blk_create_tree(const char *blk_name)
-{
-	struct dentry *dir = NULL;
-
-	mutex_lock(_tree_mutex);
-
-	if (!blk_tree_root) {
-		blk_tree_root = debugfs_create_dir("block", NULL);
-		if (!blk_tree_root)
-			goto err;
-	}
-
-	dir = debugfs_create_dir(blk_name, blk_tree_root);
-	if (dir)
-		root_users++;
-	else
-		blk_remove_root();
-
-err:
-	mutex_unlock(_tree_mutex);
-	return dir;
-}
-
 static void blk_trace_cleanup(struct blk_trace *bt)
 {
-	relay_close(bt->rchan);
-	debugfs_remove(bt->dropped_file);
-	blk_remove_tree(bt->dir);
 	free_percpu(bt->sequence);
 	kfree(bt);
 }
@@ -239,76 +192,14 @@ static int blk_trace_remove(request_queue_t *q)
 	if (!bt)
 		return -EINVAL;

-	if (bt->trace_state == Blktrace_setup ||
-	bt->trace_state == Blktrace_stopped)
+	if (!gtsc_trace_running(bt->gtsc)) {
+		gtsc_trace_cleanup(bt->gtsc);
 		blk_trace_cleanup(bt);
+	}

 	return 0;
 }

-static int blk_dropped_open(struct inode *inode, struct file *filp)
-{
-	filp->private_data = inode->i_private;
-
-	return 0;
-}
-
-static ssize_t blk_dropped_read(struct file *filp, char __user *buffer,
-size_t count, loff_t *ppos)
-{
-	struct blk_trace *bt = filp->private_data;
-	char buf[16];
-
-	snprintf(buf, sizeof(buf), "%u\n", atomic_read(>dropped));
-
-	return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
-}
-
-static const struct file_operations blk_dropped_fops = {
-	.owner =	THIS_MODULE,
-	.open =		blk_dropped_open,
-	.read =		blk_dropped_read,
-};
-
-/*
- * Keep track of how many times we encountered a full subbuffer, to aid
- * the user space app in telling how many lost events there were.
- */
-static int blk_subbuf_start_callback(struct rchan_buf *buf, void *subbuf,
- void *prev_subbuf, size_t prev_padding)
-{
-	struct blk_trace *bt;
-
-	if (!relay_buf_full(buf))
-		return 1;
-
-	bt = buf->chan->private_data;
-	atomic_inc(>dropped);
-	return 0;
-}
-
-static int blk_remove_buf_file_callback(struct dentry *dentry)
-{
-	debugfs_remove(dentry);
-	return 0;
-}
-
-static struct dentry *blk_create_buf_file_callback(const char *filename,
-		   struct dentry *parent,
-		   int mode,
-		   struct rchan_buf *buf,
-		   int *is_global)
-{
-	return debugfs_create_file(filename, mode, parent, buf,
-	_file_operations);
-}
-
-static struct rchan_callbacks blk_relay_callbacks = {
-	.subbuf_start		= blk_subbuf_start_callback,
-	.create_buf_file	= blk_crea

[RFC] Generic Trace Setup and Control (GTSC) kernel API (1/3)

2007-06-17 Thread David Wilder

Generic Trace Setup and Control (GTSC) kernel API.

This patch and the patches to follow create a kernel API that simplifies
the use of the relay subsystem.  Any "relay" based tracing application has
a common set of operations that they must performed to setup and control its
relay channel(s).  Block trace is an example of such an application.

The goal of GTSC is to simply abstract out this generic code from block
trace and make it available for other services to use.  Doing so we can 
reduce

the code in block trace as well as build a common trace control interface.

I am submitting 3 patches for review.  The first is included in this email,
2 and 3 will follow in separate emails.

1/3  GTSC documentation patch.  (gtsc-documentation.patch)
2/3  The GTSC code itself. (gtsc.patch)
3/3  Patches to convert blktrace to the new GTSC API. 
(convert-blktrace-to-gtsc.patch, blktrace-gtsc-user.patch)


The documentation patch describes the API and includes a simple kernel 
module

that demonstrates the GTSC.

I will be sending one additional patch not directly related to the 
GTSC.  This
patch fixes a bug in the relay read interface.  I mention it here only 
because
to make the GTSC example work properly you will need to apply this 
patch.  This

patch is named relay-file-read-start-pos-fix.patch.


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch provides the documentation for the Generic Trace
Setup and Control (GTSC) API.  In the kernel, GTSC provides a simple
API for starting and managing data channels to user space.  GTSC
builds on the relay interface.   The GTSC itself is provided in a separate
patch.

Signed-off-by: David Wilder <[EMAIL PROTECTED]>

 Documentation/gtsc.txt |  239 
 1 files changed, 239 insertions(+), 0 deletions(-)

diff --git a/Documentation/gtsc.txt b/Documentation/gtsc.txt
new file mode 100644
index 000..5a3a1b5
--- /dev/null
+++ b/Documentation/gtsc.txt
@@ -0,0 +1,239 @@
+Generic Trace Setup and Control (GTSC)
+==
+In the kernel, GTSC provides a simple API for starting and managing
+data channels to user space.  GTSC builds on the relay interface. For a
+complete description of the relay interface, please see:
+Documentation/filesystems/relay.txt.
+
+GTSC provides one layer in a complete tracing application.  The idea of
+the GTSC is to provide a kernel API for the setup and control of tracing
+channels.  User of GTSC must provide a data layer responsible for formatting
+and writing data into the trace channels.  
+
+A layered approach to tracing
+=
+A complete kernel tracing application consists of a data provider and a data
+consumer.  Both provider and consumer contain three layers; each layer works
+in tandem with the corresponding layer in the opposite side.  The layers are
+represented in the following diagram.
+  
+Provider Data layer
+	Formats raw trace data and provides data-related service.
+	For example, adding timestamps used by consumer to sort data.
+
+Provider Control layer
+	Provided by GTSC. Creates trace channels and informs the data layer
+	and consumer of the current state of the trace channels.
+
+Provider Buffering layer
+	Provided by relay. This layer buffers data in the
+	kernel for consumption by the consumer's buffer
+	layer.
+
+Provider (in-kernel facility)
+-
+Consumer (user application)
+
+
+Consumer Buffer layer
+	Reads/consumes data from the provider's data buffers.
+
+Consumer Control layer
+	Communicates to the provider's control layer to control the state
+	of the trace channels. 
+
+Consumer Data layer
+	Sorts and formats data as provided by the provider's data layer.
+
+The provider is coded as a kernel facility.  The consumer is coded as
+a user application.
+ 
+
+GTSC - Features
+==
+The GTSC exploits services and features provided by relay.  These features are:
+- The creation and destruction of relay channels.
+- Buffer management.  Overwrite or non-overwrite modes can be selected
+  as well as global or per-CPU buffering.
+
+Overwrite mode can be called "flight recorder mode".  Flight recorder mode
+is selected by setting the GTSC_FLIGHT flag when creating trace channels.
+In flight mode when a tracing buffer is full, the oldest records in the buffer
+will be discarded to make room as new records arrive.  In the default
+non-overwrite mode, new records may be written only if the buffer has room.
+In either case, to prevent data loss, a user space reader must keep the buffers
+drained. GTSC provides a means to detect the number of records that have been
+dropped due to a buffer-full condition (non-overwrite mode only).
+
+When per-CPU buffers are used, relay creates one debugfs file for each running
+CPU.  The user-space consumer of the data is responsible

[RFC] Generic Trace Setup and Control (GTSC) kernel API (1/3)

2007-06-17 Thread David Wilder

Generic Trace Setup and Control (GTSC) kernel API.

This patch and the patches to follow create a kernel API that simplifies
the use of the relay subsystem.  Any relay based tracing application has
a common set of operations that they must performed to setup and control its
relay channel(s).  Block trace is an example of such an application.

The goal of GTSC is to simply abstract out this generic code from block
trace and make it available for other services to use.  Doing so we can 
reduce

the code in block trace as well as build a common trace control interface.

I am submitting 3 patches for review.  The first is included in this email,
2 and 3 will follow in separate emails.

1/3  GTSC documentation patch.  (gtsc-documentation.patch)
2/3  The GTSC code itself. (gtsc.patch)
3/3  Patches to convert blktrace to the new GTSC API. 
(convert-blktrace-to-gtsc.patch, blktrace-gtsc-user.patch)


The documentation patch describes the API and includes a simple kernel 
module

that demonstrates the GTSC.

I will be sending one additional patch not directly related to the 
GTSC.  This
patch fixes a bug in the relay read interface.  I mention it here only 
because
to make the GTSC example work properly you will need to apply this 
patch.  This

patch is named relay-file-read-start-pos-fix.patch.


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch provides the documentation for the Generic Trace
Setup and Control (GTSC) API.  In the kernel, GTSC provides a simple
API for starting and managing data channels to user space.  GTSC
builds on the relay interface.   The GTSC itself is provided in a separate
patch.

Signed-off-by: David Wilder [EMAIL PROTECTED]

 Documentation/gtsc.txt |  239 
 1 files changed, 239 insertions(+), 0 deletions(-)

diff --git a/Documentation/gtsc.txt b/Documentation/gtsc.txt
new file mode 100644
index 000..5a3a1b5
--- /dev/null
+++ b/Documentation/gtsc.txt
@@ -0,0 +1,239 @@
+Generic Trace Setup and Control (GTSC)
+==
+In the kernel, GTSC provides a simple API for starting and managing
+data channels to user space.  GTSC builds on the relay interface. For a
+complete description of the relay interface, please see:
+Documentation/filesystems/relay.txt.
+
+GTSC provides one layer in a complete tracing application.  The idea of
+the GTSC is to provide a kernel API for the setup and control of tracing
+channels.  User of GTSC must provide a data layer responsible for formatting
+and writing data into the trace channels.  
+
+A layered approach to tracing
+=
+A complete kernel tracing application consists of a data provider and a data
+consumer.  Both provider and consumer contain three layers; each layer works
+in tandem with the corresponding layer in the opposite side.  The layers are
+represented in the following diagram.
+  
+Provider Data layer
+	Formats raw trace data and provides data-related service.
+	For example, adding timestamps used by consumer to sort data.
+
+Provider Control layer
+	Provided by GTSC. Creates trace channels and informs the data layer
+	and consumer of the current state of the trace channels.
+
+Provider Buffering layer
+	Provided by relay. This layer buffers data in the
+	kernel for consumption by the consumer's buffer
+	layer.
+
+Provider (in-kernel facility)
+-
+Consumer (user application)
+
+
+Consumer Buffer layer
+	Reads/consumes data from the provider's data buffers.
+
+Consumer Control layer
+	Communicates to the provider's control layer to control the state
+	of the trace channels. 
+
+Consumer Data layer
+	Sorts and formats data as provided by the provider's data layer.
+
+The provider is coded as a kernel facility.  The consumer is coded as
+a user application.
+ 
+
+GTSC - Features
+==
+The GTSC exploits services and features provided by relay.  These features are:
+- The creation and destruction of relay channels.
+- Buffer management.  Overwrite or non-overwrite modes can be selected
+  as well as global or per-CPU buffering.
+
+Overwrite mode can be called flight recorder mode.  Flight recorder mode
+is selected by setting the GTSC_FLIGHT flag when creating trace channels.
+In flight mode when a tracing buffer is full, the oldest records in the buffer
+will be discarded to make room as new records arrive.  In the default
+non-overwrite mode, new records may be written only if the buffer has room.
+In either case, to prevent data loss, a user space reader must keep the buffers
+drained. GTSC provides a means to detect the number of records that have been
+dropped due to a buffer-full condition (non-overwrite mode only).
+
+When per-CPU buffers are used, relay creates one debugfs file for each running
+CPU.  The user-space consumer of the data is responsible for reading the 
+per-CPU buffers

[RFC] Generic Trace Setup and Control (GTSC) kernel API (2/3)

2007-06-17 Thread David Wilder


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch introduces the Generic Trace Setup and Control (GTSC) API.
In the kernel, GTSC provides a simple API for starting and managing
data channels to user space.  GTSC builds on the relay interface.
The documentation for the GTSC is provided in a separate patch.

Signed-off-by: David Wilder [EMAIL PROTECTED]

 include/linux/gtsc.h |  124 
 lib/Kconfig  |9 ++
 lib/Makefile |2 +
 lib/gtsc.c   |  385 ++
 4 files changed, 520 insertions(+), 0 deletions(-)

diff --git a/include/linux/gtsc.h b/include/linux/gtsc.h
new file mode 100644
index 000..224aa16
--- /dev/null
+++ b/include/linux/gtsc.h
@@ -0,0 +1,124 @@
+/*
+ * gtsc.h - GTSC defines and function prototypes
+ *
+ * Copyright (C) 2006 IBM Inc.
+ *
+ *	Tom Zanussi [EMAIL PROTECTED]
+ *	Martin Hunt [EMAIL PROTECTED]
+ *	David Wilder [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ */
+#ifndef _LINUX_GTSC_H
+#define _LINUX_GTSC_H
+
+#include linux/relay.h
+
+/*
+ * GTSC channel flags
+ */
+#define GTSC_GLOBAL	0x01
+#define GTSC_FLIGHT	0x02
+
+enum {
+	Gtsc_trace_setup = 1,
+	Gtsc_trace_running,
+	Gtsc_trace_stopped,
+};
+
+#define GTSC_TRACE_ROOT_NAME_SIZE	64	/* Max root dir identifier */
+#define GTSC_TRACE_NAME_SIZE		64	/* Max trace identifier */
+
+/*
+ * Global root user information
+ */
+struct gtsc_root {
+	struct list_head list;
+	char gtsc_name[GTSC_TRACE_ROOT_NAME_SIZE];
+	struct dentry *gtsc_root;
+	unsigned int gtsc_users;
+};
+
+/*
+ * Client information
+ */
+struct gtsc_trace {
+	int trace_state;
+	struct dentry *state_file;
+	struct rchan *rchan;
+	struct dentry *dir;
+	struct dentry *dropped_file;
+	atomic_t dropped;
+	struct gtsc_root *root;
+	void *private_data;
+	unsigned int flags;
+	unsigned int buf_size;
+	unsigned int buf_nr;
+};
+
+static inline int gtsc_trace_running(struct gtsc_trace *gtsc)
+{
+	return gtsc-trace_state == Gtsc_trace_running;
+}
+
+#if defined(CONFIG_GTSC)
+
+/**
+ *	gtsc_trace_setup: create a new gtsc trace handle
+ *
+ *	@root: The root directory name in the root of the debugfs
+ *	   to place trace directories. Created as needed.
+ *	@name: Trace directory name, created in @root
+ *	@buf_size: size of the relay sub-buffers
+ *	@buf_nr: number of relay sub-buffers
+ *	@flags: Option selection (see GTSC channel flags definitions)
+ *		default values when flags=0 are: use per-CPU buffering,
+ *		use non-overwrite mode. See Documentation/gtsc.txt for details.
+ *
+ *	returns a gtsc_trace handle or NULL, if setup failed.
+ */
+extern struct gtsc_trace *gtsc_trace_setup(char *root, char *name, u32 buf_size,
+	 u32 buf_nr, u32 flags);
+
+/**
+ *	gtsc_trace_startstop: start or stop tracing.
+ *
+ *	@gtsc: gtsc trace handle to start or stop.
+ *	@start: set to 1 to start tracing set to 0 to stop.
+ *
+ *	returns 0 if successful.
+ */
+extern int gtsc_trace_startstop(struct gtsc_trace *gtsc, int start);
+
+/**
+ *	gtsc_trace_cleanup: destroys the gtsc channel.
+ *
+ *	@gtsc: gtsc trace handle to cleanup
+ */
+extern void gtsc_trace_cleanup(struct gtsc_trace *gtsc);
+
+/**
+ *	gtsc_timestamp: returns a time stamp.
+ */
+extern unsigned long long  gtsc_timestamp(void);
+
+#else /* !CONFIG_GTSC */
+#define gtsc_trace_setup(root, name, buf_size, buf_nr, flags)	(NULL)
+#define gtsc_trace_startstop(gtsc, start)	(-EINVAL)
+#define gtsc_trace_cleanup(gtsc)		do { } while (0)
+#define gtsc_timestamp(void) 			(unsigned long long) (0)
+#endif /* CONFIG_GTSC */
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 2e7ae6b..d6e048f 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -124,4 +124,13 @@ config HAS_DMA
 	depends on !NO_DMA
 	default y
 
+config GTSC
+	bool Generic Trace Setup and Control
+	select RELAY
+	select DEBUG_FS
+	help
+	This option enables support for the GTSC. 
+
+	If unsure, say N.
+
 endmenu
diff --git a/lib/Makefile b/lib/Makefile
index c8c8e20..dcbdb5e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -62,6 +62,8 @@ obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 
 lib-$(CONFIG_GENERIC_BUG) += bug.o
 
+obj-$(CONFIG_GTSC) += gtsc.o
+
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
diff --git a/lib/gtsc.c b/lib/gtsc.c
new file mode 100644
index 000..c376006

[RFC] Generic Trace Setup and Control (GTSC) kernel API (3/3)

2007-06-17 Thread David Wilder

Patches to convert blktrace to the new GTSC API.
Two patches are included, the first is to the kernel portion of 
blktrace.  Apply the second patch is to the blktrace user code.


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch converts the blktrace facility to use the Generic Trace
Setup and Control (GTSC) API.  (kernel patch)

Signed-off-by: Tom Zanussi [EMAIL PROTECTED]
Signed-off-by: David Wilder [EMAIL PROTECTED]

diff --git a/block/Kconfig b/block/Kconfig
index a50f481..9ae9a8c 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -30,7 +30,7 @@ config LBD
 config BLK_DEV_IO_TRACE
 	bool Support for tracing block io actions
 	depends on SYSFS
-	select RELAY
+	select GTSC
 	select DEBUG_FS
 	help
 	  Say Y here, if you want to be able to trace the block layer actions
diff --git a/block/blktrace.c b/block/blktrace.c
index 3f0e7c3..b4acf89 100644
--- a/block/blktrace.c
+++ b/block/blktrace.c
@@ -36,7 +36,7 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
 {
 	struct blk_io_trace *t;

-	t = relay_reserve(bt-rchan, sizeof(*t) + len);
+	t = relay_reserve(bt-gtsc-rchan, sizeof(*t) + len);
 	if (t) {
 		const int cpu = smp_processor_id();

@@ -126,7 +126,7 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 	pid_t pid;
 	int cpu;

-	if (unlikely(bt-trace_state != Blktrace_running))
+	if (unlikely(!gtsc_trace_running(bt-gtsc)))
 		return;

 	what |= ddir_act[rw  WRITE];
@@ -152,7 +152,7 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 	if (unlikely(tsk-btrace_seq != blktrace_seq))
 		trace_note_tsk(bt, tsk);

-	t = relay_reserve(bt-rchan, sizeof(*t) + pdu_len);
+	t = relay_reserve(bt-gtsc-rchan, sizeof(*t) + pdu_len);
 	if (t) {
 		cpu = smp_processor_id();
 		sequence = per_cpu_ptr(bt-sequence, cpu);
@@ -178,55 +178,8 @@ void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,

 EXPORT_SYMBOL_GPL(__blk_add_trace);

-static struct dentry *blk_tree_root;
-static struct mutex blk_tree_mutex;
-static unsigned int root_users;
-
-static inline void blk_remove_root(void)
-{
-	if (blk_tree_root) {
-		debugfs_remove(blk_tree_root);
-		blk_tree_root = NULL;
-	}
-}
-
-static void blk_remove_tree(struct dentry *dir)
-{
-	mutex_lock(blk_tree_mutex);
-	debugfs_remove(dir);
-	if (--root_users == 0)
-		blk_remove_root();
-	mutex_unlock(blk_tree_mutex);
-}
-
-static struct dentry *blk_create_tree(const char *blk_name)
-{
-	struct dentry *dir = NULL;
-
-	mutex_lock(blk_tree_mutex);
-
-	if (!blk_tree_root) {
-		blk_tree_root = debugfs_create_dir(block, NULL);
-		if (!blk_tree_root)
-			goto err;
-	}
-
-	dir = debugfs_create_dir(blk_name, blk_tree_root);
-	if (dir)
-		root_users++;
-	else
-		blk_remove_root();
-
-err:
-	mutex_unlock(blk_tree_mutex);
-	return dir;
-}
-
 static void blk_trace_cleanup(struct blk_trace *bt)
 {
-	relay_close(bt-rchan);
-	debugfs_remove(bt-dropped_file);
-	blk_remove_tree(bt-dir);
 	free_percpu(bt-sequence);
 	kfree(bt);
 }
@@ -239,76 +192,14 @@ static int blk_trace_remove(request_queue_t *q)
 	if (!bt)
 		return -EINVAL;

-	if (bt-trace_state == Blktrace_setup ||
-	bt-trace_state == Blktrace_stopped)
+	if (!gtsc_trace_running(bt-gtsc)) {
+		gtsc_trace_cleanup(bt-gtsc);
 		blk_trace_cleanup(bt);
+	}

 	return 0;
 }

-static int blk_dropped_open(struct inode *inode, struct file *filp)
-{
-	filp-private_data = inode-i_private;
-
-	return 0;
-}
-
-static ssize_t blk_dropped_read(struct file *filp, char __user *buffer,
-size_t count, loff_t *ppos)
-{
-	struct blk_trace *bt = filp-private_data;
-	char buf[16];
-
-	snprintf(buf, sizeof(buf), %u\n, atomic_read(bt-dropped));
-
-	return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
-}
-
-static const struct file_operations blk_dropped_fops = {
-	.owner =	THIS_MODULE,
-	.open =		blk_dropped_open,
-	.read =		blk_dropped_read,
-};
-
-/*
- * Keep track of how many times we encountered a full subbuffer, to aid
- * the user space app in telling how many lost events there were.
- */
-static int blk_subbuf_start_callback(struct rchan_buf *buf, void *subbuf,
- void *prev_subbuf, size_t prev_padding)
-{
-	struct blk_trace *bt;
-
-	if (!relay_buf_full(buf))
-		return 1;
-
-	bt = buf-chan-private_data;
-	atomic_inc(bt-dropped);
-	return 0;
-}
-
-static int blk_remove_buf_file_callback(struct dentry *dentry)
-{
-	debugfs_remove(dentry);
-	return 0;
-}
-
-static struct dentry *blk_create_buf_file_callback(const char *filename,
-		   struct dentry *parent,
-		   int mode,
-		   struct rchan_buf *buf,
-		   int *is_global)
-{
-	return debugfs_create_file(filename, mode, parent, buf,
-	relay_file_operations);
-}
-
-static struct rchan_callbacks blk_relay_callbacks = {
-	.subbuf_start		= blk_subbuf_start_callback,
-	.create_buf_file	= blk_create_buf_file_callback,
-	.remove_buf_file	= blk_remove_buf_file_callback,
-};
-
 /*
  * Setup everything required to start tracing

[PATCH] relay-file-read-start-pos-fix.patch

2007-06-17 Thread David Wilder


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
[EMAIL PROTECTED]

(503)578-3789

This patch fixes a bug in the relay read interface causing the number
of consumed bytes to be set incorrectly. 

Signed-off-by: Tom Zanussi [EMAIL PROTECTED]
Signed-off-by: David Wilder [EMAIL PROTECTED]

 kernel/relay.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/relay.c b/kernel/relay.c
index 4311101..e61156e 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -899,7 +899,10 @@ static size_t relay_file_read_start_pos(size_t read_pos,
 	size_t read_subbuf, padding, padding_start, padding_end;
 	size_t subbuf_size = buf-chan-subbuf_size;
 	size_t n_subbufs = buf-chan-n_subbufs;
+	size_t consumed = buf-subbufs_consumed % n_subbufs;
 
+	if (!read_pos)
+		read_pos = consumed * subbuf_size + buf-bytes_consumed;
 	read_subbuf = read_pos / subbuf_size;
 	padding = buf-padding[read_subbuf];
 	padding_start = (read_subbuf + 1) * subbuf_size - padding;


[patch] s390 kprobes: Align probe address

2007-03-21 Thread David Wilder
[This patch applies to both linux and mm trees.  Please send comments 
off list, thanks]
Running a probe on s390 with a probe address that is not 4 byte aligned
results in a Kernel BUG.  The problem is that the stura instruction used
by swap_instruction requires the destination address to be 4 byte aligned.
As stura only writes 4 bytes, aligning to the next 4 byte aligned address
results in the breakpoint instruction being stored past the probe address.
The fix is to align the address backward (to the previous 4 byte aligned
address) and writing the two byte breakpoint instruction in the appropriate
bytes.

Signed-off-by: David Wilder <[EMAIL PROTECTED]>

diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index 8af549e..993f353 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -167,7 +167,7 @@ static int __kprobes swap_instruction(vo
 	 * shall not cross any page boundaries (vmalloc area!) when writing
 	 * the new instruction.
 	 */
-	addr = (u32 *)ALIGN((unsigned long)args->ptr, 4);
+	addr = (u32 *)((unsigned long)args->ptr & -4UL);
 	if ((unsigned long)args->ptr & 2)
 		instr = ((*addr) & 0x) | args->new;
 	else


[patch] s390 kprobes: Align probe address

2007-03-21 Thread David Wilder
[This patch applies to both linux and mm trees.  Please send comments 
off list, thanks]
Running a probe on s390 with a probe address that is not 4 byte aligned
results in a Kernel BUG.  The problem is that the stura instruction used
by swap_instruction requires the destination address to be 4 byte aligned.
As stura only writes 4 bytes, aligning to the next 4 byte aligned address
results in the breakpoint instruction being stored past the probe address.
The fix is to align the address backward (to the previous 4 byte aligned
address) and writing the two byte breakpoint instruction in the appropriate
bytes.

Signed-off-by: David Wilder [EMAIL PROTECTED]

diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index 8af549e..993f353 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -167,7 +167,7 @@ static int __kprobes swap_instruction(vo
 	 * shall not cross any page boundaries (vmalloc area!) when writing
 	 * the new instruction.
 	 */
-	addr = (u32 *)ALIGN((unsigned long)args-ptr, 4);
+	addr = (u32 *)((unsigned long)args-ptr  -4UL);
 	if ((unsigned long)args-ptr  2)
 		instr = ((*addr)  0x) | args-new;
 	else