subject:"Using sched_clock for mmio\-trace"

On Fri, Feb 16, 2007 at 02:47:45PM -0800, Daniel Walker wrote:
> > > Gets pretty ugly .. The clocksource interface already has a positive
> > > rating to describe the "best" clocks in the system, which is used to
> > > return the "best" clock .. Where the maintainers of the system give each
> > > clock a rating. I would imagine most people would just get the so called
> > > "best" clock which has the best rating..
> > > 
> > > I'm starting to think this long flags stringing effect could happen with
> > > negative flags also, but it's seems a lot less likely.
> > 
> > The amount of flag stringing should be the same.
> 
> I don't think so .. The common case with negative flags is no flags,
> then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two
> flags .. The other direction your likely to have people using all flags
> most of the time. That's why I showed a function call with all the flags
> listed.

I think you still misunderstand me. The common case is still no flags.

clocksource_get_clock_must_have(0) would return clocks that are stable
and unstable. clocksource_get_clock_must_have(CLOCKSOURCE_STABLE) would
only return clocks that are stable, just like
clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) only returns clocks
that are stable.

> > > > instead of
> > > > 
> > > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> > > > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> > > > 
> > > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> > > > anyway to specify that I want a 64bit timer, only a way to specify that
> > > > I don't.
> > > 
> > > I might add a way to get specific flags, but I still think the flags
> > > should be mostly negative features.
> > 
> > Yeah, the problem is that all of the features are negative except for
> > CLOCKSOURCE_64BIT, so you can't mask for it.
> 
> It's meant as a negative feature. So you can mask it if you can't handle
> the math .. The only 64bit clock I know off is the tsc, and it's got the
> highest rating of all clocks.

Ah ok, I see that now. Maybe CLOCKSOURCE_OVER_32BITS would be a better
name? It might convey the negativity better...

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 17:10 -0500, Jeff Muizelaar wrote:

> 
> I still meant for _with_features to have same semantics so calling:
> 
> clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
>  |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);
> 
> would be equivalent to calling:
> 
> clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE
>  |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS);
> 
> The only difference is that the naming is reversed. i.e. it isn't a
> doulbe negative.

Yes, I assumed that ..

> perhaps a better name would clocksource_get_clock_must_have() or
> clocksource_get_clock_must_be()
>  
> > Gets pretty ugly .. The clocksource interface already has a positive
> > rating to describe the "best" clocks in the system, which is used to
> > return the "best" clock .. Where the maintainers of the system give each
> > clock a rating. I would imagine most people would just get the so called
> > "best" clock which has the best rating..
> > 
> > I'm starting to think this long flags stringing effect could happen with
> > negative flags also, but it's seems a lot less likely.
> 
> The amount of flag stringing should be the same.

I don't think so .. The common case with negative flags is no flags,
then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two
flags .. The other direction your likely to have people using all flags
most of the time. That's why I showed a function call with all the flags
listed.

To be clear the clocksources are already sorted with a rating (no flags
involved). 

So for example, On i386 the clocks are sorts as follows,

tsc (300)
acpi_pm (200)
pit (110)
jiffies (1)

So the bulk of the positive information is already there for you. The
tsc clock is by far the best in the system if it's stable. If you ignore
the flags, then you will always be given the tsc .

> > > instead of
> > > 
> > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> > > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> > > 
> > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> > > anyway to specify that I want a 64bit timer, only a way to specify that
> > > I don't.
> > 
> > I might add a way to get specific flags, but I still think the flags
> > should be mostly negative features.
> 
> Yeah, the problem is that all of the features are negative except for
> CLOCKSOURCE_64BIT, so you can't mask for it.

It's meant as a negative feature. So you can mask it if you can't handle
the math .. The only 64bit clock I know off is the tsc, and it's got the
highest rating of all clocks.


> It might be good if something like these explanations could be added as
> comments beside the feature flags.

Ok .. On my TODO list ..

> So, if I was a systemtap style user of the clocksource api, I'd still have to
> do something like:
> 
> init() {
>   // assume this gives me the tsc
>   clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC);
>   register a cpu_freq notifier
> }
> 
> trace() {
>   trace.time =
>   
> compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock))
> }
> 
> In this case it doesn't look like using the clocksource stuff helps this
> style of user much as all that is abstracted away is really the reading
> the tsc. On the other hand it sounds like if you want all this stuff
> taken care of for you, most people should be using do_gettimeofday()
> instead.

Instead of using an unstable clock and trying to compensate for it, the
solution is not to use unstable clocks. The clocksource stuff does help
with that.. You can either use the tsc until it becomes unstable, or use
the acpi_pm clock which never becomes unstable. Both are used in the
same way.

> init() {
  // gives you the acpi_pm
  clock = 
clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_NOT_ATOMIC);
>   register a cpu_freq notifier
> }
> 
> trace() {
>   trace.time =
  cyc2ns(clock, clocksource_read(clock))
> }
> 


Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 01:06:19PM -0800, Daniel Walker wrote:
> On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote:
> > It think it would be better if you had sometime like
> > 'clocksource_get_clock_with_features()' that took flags describing the
> > needed characteristics instead of the unwanted ones.
> > 
> > e.g.
> > clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
> > or
> > clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)
> > 
> 
> One reason I did it the other way is because it's easier to specify what
> you know you don't want, than to specify all the things that you know
> you do want.
> 
> Almost everyone would want every flags listed,
> 
> clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
> |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

I still meant for _with_features to have same semantics so calling:

clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
 |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

would be equivalent to calling:

clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE
 |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS);

The only difference is that the naming is reversed. i.e. it isn't a
doulbe negative.

perhaps a better name would clocksource_get_clock_must_have() or
clocksource_get_clock_must_be()
 
> Gets pretty ugly .. The clocksource interface already has a positive
> rating to describe the "best" clocks in the system, which is used to
> return the "best" clock .. Where the maintainers of the system give each
> clock a rating. I would imagine most people would just get the so called
> "best" clock which has the best rating..
> 
> I'm starting to think this long flags stringing effect could happen with
> negative flags also, but it's seems a lot less likely.

The amount of flag stringing should be the same.

> 
> > instead of
> > 
> > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> > 
> > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> > anyway to specify that I want a 64bit timer, only a way to specify that
> > I don't.
> 
> I might add a way to get specific flags, but I still think the flags
> should be mostly negative features.

Yeah, the problem is that all of the features are negative except for
CLOCKSOURCE_64BIT, so you can't mask for it.

> > It also isn't clear what the implications of some of the flags are:
> > e.g:
> > NOT_CONTINUOUS - don't really have any idea what this means.
> 
> It means the clock uses an interrupt to extend it's precision, or it's
> not a real cycle counter and depends only on an interrupt for timing.
> 
> > UNSTABLE - this means the frequency can change right?
> > Does PM_AFFECTED imply UNSTABLE?
> 
> PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE
> means it's already become unstable.
> 
> > NOT_ATOMIC - does this affect me as user?
> 
> It could .. The PIT clock for example takes a spinlock. Some users might
> not want to use that clock cause they call from a context where that
> isn't acceptable (LTT for example). It's also slower to read from.
> 
> > PM_AFFECTED - it looks like the stp code deals with cpu speed
> > changing. Does the clocksource code do this for me with
> > cyc2ns?  If it does are there any reason I would want to
> > avoid PM_AFFECTED clocks? If it doesn't how do I know
> > that I need to correct it myself.
> 
> There is a block notification system that lets you know when a clock
> becomes unstable . cyc2ns doesn't compensate for frequency changes . The
> TSC for example could fluctuate frequency pretty often. sched_clock for
> example uses the clock no matter what the state is , which is why I
> leave unstable clock in the list.

It might be good if something like these explanations could be added as
comments beside the feature flags.

So, if I was a systemtap style user of the clocksource api, I'd still have to
do something like:

init() {
// assume this gives me the tsc
clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC);
register a cpu_freq notifier
}

trace() {
trace.time =

compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock))
}

In this case it doesn't look like using the clocksource stuff helps this
style of user much as all that is abstracted away is really the reading
the tsc. On the other hand it sounds like if you want all this stuff
taken care of for you, most people should be using do_gettimeofday()
instead.

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

Hi -

On Fri, Feb 16, 2007 at 09:03:23PM +0100, Andi Kleen wrote:
> > We in systemtap land have the same problem, and so far made do with
> > slightly postprocessed per-cpu TSC values.
> 
> 90+% likely you're not solving your problem correctly this way.

Yes, it was done as a last resort.

We need facility that we can call from even more demanding contexts
than interrupt handlers, considering that kprobes can be placed nearly
anywhere.  This is one of the reasons why we don't just use good old
do_gettimeofday(), since it takes locks and can lead to lock recursion
if parts of itself are probed.

- FChE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote:
> On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote:
> > On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
> > > On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> > > > I've been working on a patch set (below), to expose the clocksources
> > > > used by generic time to multiple users . It would allow timestamps from
> > > > different clocks in a generic way. It's not merged, but I'd appreciate
> > > > any input either of you might have..
> > > > 
> > > > ftp://source.mvista.com/pub/dwalker/clocksource/
> > > 
> > > Is it possible to see the resulting clocksource.h and maybe
> > > clocksource.c after the patch set? That would make looking at it much
> > > easier.
> > 
> > Well you could just apply the patch set, but I stuck them in the same
> > directory as above .. I'll delete them in 24 hours or so ..
> > 
> > At one point I replaced sched_clock() , 
> > 
> > ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch
> > 
> > The API is similar to that version, and sched_clock was the simplest
> > user of the API that I've done.
> 
> Ok, so it would basically be:
> 
> init()
> {
>   clocksource *clock = clocksource_get_best_clock();
> }
> 
> trace()
> {
>   trace.time = cyc2ns(clock, clocksource_read(clock));
> }
> 
> This seems pretty sane to me. 

sched_clock has the down side of not taking into account rollover .. The
scheduler doesn't care if it gets a few bad timestamps now and then, but
that might screw up other code..

Also you could read cycles , and then later convert to nanoseconds when
it's needed .. Which makes taking the timestamps a little faster.

> The blk-trace code calibrates a per cpu offset for the sched_clock()
> time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does
> the clocksource stuff help me with this or would I still need to do
> something like that?

Most of the clocksources are not per cpu .. But the tsc clocksource
doesn't keep any per cpu offset information. There has been some work on
syncing the TSC across cpus so I'm not sure to what extend an offset
would be needed.

> I also noticed that you have a clocksource_get_masked_clock() call. This
> seems like a pretty awkward API to me. The first thing that came to mind
> when I read the name was 'what is a masked clock'. When I realized that
> it meant 'a clock w/o this flag', I still found it awkward that one has
> to specify what that don't want. e.g 'I don't want a clock that is not
> continuous.'

yeah it is a little strange . 

> It think it would be better if you had sometime like
> 'clocksource_get_clock_with_features()' that took flags describing the
> needed characteristics instead of the unwanted ones.
> 
> e.g.
> clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
> or
> clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)
> 

One reason I did it the other way is because it's easier to specify what
you know you don't want, than to specify all the things that you know
you do want.

Almost everyone would want every flags listed,

clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
|CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

Gets pretty ugly .. The clocksource interface already has a positive
rating to describe the "best" clocks in the system, which is used to
return the "best" clock .. Where the maintainers of the system give each
clock a rating. I would imagine most people would just get the so called
"best" clock which has the best rating..

I'm starting to think this long flags stringing effect could happen with
negative flags also, but it's seems a lot less likely.

> instead of
> 
> clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> 
> Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> anyway to specify that I want a 64bit timer, only a way to specify that
> I don't.

I might add a way to get specific flags, but I still think the flags
should be mostly negative features.

> It also isn't clear what the implications of some of the flags are:
> e.g:
>   NOT_CONTINUOUS - don't really have any idea what this means.

It means the clock uses an interrupt to extend it's precision, or it's
not a real cycle counter and depends only on an interrupt for timing.

>   UNSTABLE - this means the frequency can change right?
>   Does PM_AFFECTED imply UNSTABLE?

PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE
means it's already become unstable.

>   NOT_ATOMIC - does this affect me as user?

It could .. The PIT clock for example takes a spinlock. Some users might
not want to use that clock cause they call from a context where that
isn't acceptable (LTT for example). It's also slower to read from.

>   PM_AFFECTED - it looks like the stp code deals with cpu speed
>   changing.

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 10:44:15AM -0800, Randy Dunlap wrote:
> On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote:
> 
> > On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
> > > Jeff Muizelaar <[EMAIL PROTECTED]> writes:
> > > 
> > > > I've built a tool with the goal of logging mmio writes and reads by
> > > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> > > 
> > > FWIW, this is exactly a type of add-on trace patch that could be
> > > mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> > > you would not need so much code (e.g. no new user-space tools at all,
> > > reuse of common tracing buffer logic, permanently placed hooks) and
> > > would probably get more utility.
> > 
> > Is there more information on this "marker" facility? e.g. what is a
> > marker? Are they just like tracepoints?
> 
> On lkml for the past few days/months.
> Look for "Linux Kernel Markers" in the subject line.
> 
> Mathieu, do you have a web site for LK Markers?
> 

Yeah, so if I understand correctly, markers are basically compile time locations
that you can attach function calls to at run time, right?. If so, I
don't think they are of much use to me.

What I am doing is page-faulting on every read or write to an mmio
region, decoding the faulting instruction and passing the decoded
information up to userspace through relayfs.

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 09:02:44PM +0100, Andi Kleen wrote:
> Jeff Muizelaar <[EMAIL PROTECTED]> writes:
> > 
> > The question is, what api should I be using? I need something that can
> > be called from inside interrupt handlers, and obviously the more
> > accurate and the lower the overhead the better.
> 
> Use do_gettimeofday(). sched_clock() is not for general use
> and only for some very limited use cases and will give you
> unexpected results in several cases.
> 
> There are a few cases where gtod is still a little slow, but these
> are being addressed. In many cases it is fast. In some hardware
> it stays slow, but there is not much that can be done about that
> because of the hardware design.
> 
> It works fine from interrupt handlers and other strange contexts.

Ok, I'll probably use it then. How does the overhead of calling
do_gettimteofday() compare to doing an mmio read/write over PCI express?
i.e. is it going to be a performance problem if I call do_gettimeofday
for every mmio read/write?

Also, is there any good reason blk-trace doesn't use do_gettimeofday()? 

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote:
> On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
> > On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> > > I've been working on a patch set (below), to expose the clocksources
> > > used by generic time to multiple users . It would allow timestamps from
> > > different clocks in a generic way. It's not merged, but I'd appreciate
> > > any input either of you might have..
> > > 
> > > ftp://source.mvista.com/pub/dwalker/clocksource/
> > 
> > Is it possible to see the resulting clocksource.h and maybe
> > clocksource.c after the patch set? That would make looking at it much
> > easier.
> 
> Well you could just apply the patch set, but I stuck them in the same
> directory as above .. I'll delete them in 24 hours or so ..
> 
> At one point I replaced sched_clock() , 
> 
> ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch
> 
> The API is similar to that version, and sched_clock was the simplest
> user of the API that I've done.

Ok, so it would basically be:

init()
{
clocksource *clock = clocksource_get_best_clock();
}

trace()
{
trace.time = cyc2ns(clock, clocksource_read(clock));
}

This seems pretty sane to me. 

The blk-trace code calibrates a per cpu offset for the sched_clock()
time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does
the clocksource stuff help me with this or would I still need to do
something like that?

I also noticed that you have a clocksource_get_masked_clock() call. This
seems like a pretty awkward API to me. The first thing that came to mind
when I read the name was 'what is a masked clock'. When I realized that
it meant 'a clock w/o this flag', I still found it awkward that one has
to specify what that don't want. e.g 'I don't want a clock that is not
continuous.'

It think it would be better if you had sometime like
'clocksource_get_clock_with_features()' that took flags describing the
needed characteristics instead of the unwanted ones.

e.g.
clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
or
clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)

instead of:

clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)

Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
anyway to specify that I want a 64bit timer, only a way to specify that
I don't.

It also isn't clear what the implications of some of the flags are:
e.g:
NOT_CONTINUOUS - don't really have any idea what this means.
UNSTABLE - this means the frequency can change right?
Does PM_AFFECTED imply UNSTABLE?
NOT_ATOMIC - does this affect me as user?
PM_AFFECTED - it looks like the stp code deals with cpu speed
changing. Does the clocksource code do this for me with
cyc2ns?  If it does are there any reason I would want to
avoid PM_AFFECTED clocks? If it doesn't how do I know
that I need to correct it myself.

Hope this helps,

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

[EMAIL PROTECTED] (Frank Ch. Eigler) writes:
> 
> We in systemtap land have the same problem, and so far made do with
> slightly postprocessed per-cpu TSC values.

90+% likely you're not solving your problem correctly this way.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

Jeff Muizelaar <[EMAIL PROTECTED]> writes:
> 
> The question is, what api should I be using? I need something that can
> be called from inside interrupt handlers, and obviously the more
> accurate and the lower the overhead the better.

Use do_gettimeofday(). sched_clock() is not for general use
and only for some very limited use cases and will give you
unexpected results in several cases.

There are a few cases where gtod is still a little slow, but these
are being addressed. In many cases it is fast. In some hardware
it stays slow, but there is not much that can be done about that
because of the hardware design.

It works fine from interrupt handlers and other strange contexts. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

2007-02-16 Thread Randy Dunlap

On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote:

> On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
> > Jeff Muizelaar <[EMAIL PROTECTED]> writes:
> > 
> > > I've built a tool with the goal of logging mmio writes and reads by
> > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> > 
> > FWIW, this is exactly a type of add-on trace patch that could be
> > mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> > you would not need so much code (e.g. no new user-space tools at all,
> > reuse of common tracing buffer logic, permanently placed hooks) and
> > would probably get more utility.
> 
> Is there more information on this "marker" facility? e.g. what is a
> marker? Are they just like tracepoints?

On lkml for the past few days/months.
Look for "Linux Kernel Markers" in the subject line.

Mathieu, do you have a web site for LK Markers?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
> On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> > I've been working on a patch set (below), to expose the clocksources
> > used by generic time to multiple users . It would allow timestamps from
> > different clocks in a generic way. It's not merged, but I'd appreciate
> > any input either of you might have..
> > 
> > ftp://source.mvista.com/pub/dwalker/clocksource/
> 
> Is it possible to see the resulting clocksource.h and maybe
> clocksource.c after the patch set? That would make looking at it much
> easier.

Well you could just apply the patch set, but I stuck them in the same
directory as above .. I'll delete them in 24 hours or so ..

At one point I replaced sched_clock() , 

ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch

The API is similar to that version, and sched_clock was the simplest
user of the API that I've done.

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
> Jeff Muizelaar <[EMAIL PROTECTED]> writes:
> 
> > I've built a tool with the goal of logging mmio writes and reads by
> > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> 
> FWIW, this is exactly a type of add-on trace patch that could be
> mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> you would not need so much code (e.g. no new user-space tools at all,
> reuse of common tracing buffer logic, permanently placed hooks) and
> would probably get more utility.

Is there more information on this "marker" facility? e.g. what is a
marker? Are they just like tracepoints?

> > [...]  The question is, what [timer] api should I be using? I need
> > something that can be called from inside interrupt handlers, and
> > obviously the more accurate and the lower the overhead the better.
> 
> We in systemtap land have the same problem, and so far made do with
> slightly postprocessed per-cpu TSC values.

Thanks, I'll have a look at systemtap does. It's the stuff
src/runtime/time.c right?

Also, you might want to move this stuff from the README to someplace
more prominent on the systemtap website. It's nice not to have to dig
far to get the source...

Download systemtap sources snapshot or from CVS:
ftp://sources.redhat.com/pub/systemtap/snapshots/ 
(or)
cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap login
# enter "anoncvs" as the password
cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap co src

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> I've been working on a patch set (below), to expose the clocksources
> used by generic time to multiple users . It would allow timestamps from
> different clocks in a generic way. It's not merged, but I'd appreciate
> any input either of you might have..
> 
> ftp://source.mvista.com/pub/dwalker/clocksource/

Is it possible to see the resulting clocksource.h and maybe
clocksource.c after the patch set? That would make looking at it much
easier.

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 11:30 -0500, Frank Ch. Eigler wrote:
> Jeff Muizelaar <[EMAIL PROTECTED]> writes:
> 
> > I've built a tool with the goal of logging mmio writes and reads by
> > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> 
> FWIW, this is exactly a type of add-on trace patch that could be
> mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> you would not need so much code (e.g. no new user-space tools at all,
> reuse of common tracing buffer logic, permanently placed hooks) and
> would probably get more utility.
> 
> > [...]  The question is, what [timer] api should I be using? I need
> > something that can be called from inside interrupt handlers, and
> > obviously the more accurate and the lower the overhead the better.
> 
> We in systemtap land have the same problem, and so far made do with
> slightly postprocessed per-cpu TSC values.

I've been working on a patch set (below), to expose the clocksources
used by generic time to multiple users . It would allow timestamps from
different clocks in a generic way. It's not merged, but I'd appreciate
any input either of you might have..

ftp://source.mvista.com/pub/dwalker/clocksource/

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

Jeff Muizelaar <[EMAIL PROTECTED]> writes:

> I've built a tool with the goal of logging mmio writes and reads by
> device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.

FWIW, this is exactly a type of add-on trace patch that could be
mooted by adoption of the ltt/systemtap "marker" facility.  With it,
you would not need so much code (e.g. no new user-space tools at all,
reuse of common tracing buffer logic, permanently placed hooks) and
would probably get more utility.

> [...]  The question is, what [timer] api should I be using? I need
> something that can be called from inside interrupt handlers, and
> obviously the more accurate and the lower the overhead the better.

We in systemtap land have the same problem, and so far made do with
slightly postprocessed per-cpu TSC values.

- FChE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

Jeff Muizelaar [EMAIL PROTECTED] writes:

 I've built a tool with the goal of logging mmio writes and reads by
 device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.

FWIW, this is exactly a type of add-on trace patch that could be
mooted by adoption of the ltt/systemtap marker facility.  With it,
you would not need so much code (e.g. no new user-space tools at all,
reuse of common tracing buffer logic, permanently placed hooks) and
would probably get more utility.

 [...]  The question is, what [timer] api should I be using? I need
 something that can be called from inside interrupt handlers, and
 obviously the more accurate and the lower the overhead the better.

We in systemtap land have the same problem, and so far made do with
slightly postprocessed per-cpu TSC values.


- FChE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 11:30 -0500, Frank Ch. Eigler wrote:
 Jeff Muizelaar [EMAIL PROTECTED] writes:
 
  I've built a tool with the goal of logging mmio writes and reads by
  device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
 
 FWIW, this is exactly a type of add-on trace patch that could be
 mooted by adoption of the ltt/systemtap marker facility.  With it,
 you would not need so much code (e.g. no new user-space tools at all,
 reuse of common tracing buffer logic, permanently placed hooks) and
 would probably get more utility.
 
  [...]  The question is, what [timer] api should I be using? I need
  something that can be called from inside interrupt handlers, and
  obviously the more accurate and the lower the overhead the better.
 
 We in systemtap land have the same problem, and so far made do with
 slightly postprocessed per-cpu TSC values.

I've been working on a patch set (below), to expose the clocksources
used by generic time to multiple users . It would allow timestamps from
different clocks in a generic way. It's not merged, but I'd appreciate
any input either of you might have..

ftp://source.mvista.com/pub/dwalker/clocksource/

Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
 I've been working on a patch set (below), to expose the clocksources
 used by generic time to multiple users . It would allow timestamps from
 different clocks in a generic way. It's not merged, but I'd appreciate
 any input either of you might have..
 
 ftp://source.mvista.com/pub/dwalker/clocksource/

Is it possible to see the resulting clocksource.h and maybe
clocksource.c after the patch set? That would make looking at it much
easier.

-Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
 Jeff Muizelaar [EMAIL PROTECTED] writes:
 
  I've built a tool with the goal of logging mmio writes and reads by
  device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
 
 FWIW, this is exactly a type of add-on trace patch that could be
 mooted by adoption of the ltt/systemtap marker facility.  With it,
 you would not need so much code (e.g. no new user-space tools at all,
 reuse of common tracing buffer logic, permanently placed hooks) and
 would probably get more utility.

Is there more information on this marker facility? e.g. what is a
marker? Are they just like tracepoints?

  [...]  The question is, what [timer] api should I be using? I need
  something that can be called from inside interrupt handlers, and
  obviously the more accurate and the lower the overhead the better.
 
 We in systemtap land have the same problem, and so far made do with
 slightly postprocessed per-cpu TSC values.

Thanks, I'll have a look at systemtap does. It's the stuff
src/runtime/time.c right?

Also, you might want to move this stuff from the README to someplace
more prominent on the systemtap website. It's nice not to have to dig
far to get the source...

Download systemtap sources snapshot or from CVS:
ftp://sources.redhat.com/pub/systemtap/snapshots/ 
(or)
cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap login
# enter anoncvs as the password
cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap co src

-Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
 On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
  I've been working on a patch set (below), to expose the clocksources
  used by generic time to multiple users . It would allow timestamps from
  different clocks in a generic way. It's not merged, but I'd appreciate
  any input either of you might have..
  
  ftp://source.mvista.com/pub/dwalker/clocksource/
 
 Is it possible to see the resulting clocksource.h and maybe
 clocksource.c after the patch set? That would make looking at it much
 easier.

Well you could just apply the patch set, but I stuck them in the same
directory as above .. I'll delete them in 24 hours or so ..

At one point I replaced sched_clock() , 

ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch

The API is similar to that version, and sched_clock was the simplest
user of the API that I've done.

Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

2007-02-16 Thread Randy Dunlap

On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote:

 On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
  Jeff Muizelaar [EMAIL PROTECTED] writes:
  
   I've built a tool with the goal of logging mmio writes and reads by
   device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
  
  FWIW, this is exactly a type of add-on trace patch that could be
  mooted by adoption of the ltt/systemtap marker facility.  With it,
  you would not need so much code (e.g. no new user-space tools at all,
  reuse of common tracing buffer logic, permanently placed hooks) and
  would probably get more utility.
 
 Is there more information on this marker facility? e.g. what is a
 marker? Are they just like tracepoints?

On lkml for the past few days/months.
Look for Linux Kernel Markers in the subject line.

Mathieu, do you have a web site for LK Markers?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

Jeff Muizelaar [EMAIL PROTECTED] writes:
 
 The question is, what api should I be using? I need something that can
 be called from inside interrupt handlers, and obviously the more
 accurate and the lower the overhead the better.

Use do_gettimeofday(). sched_clock() is not for general use
and only for some very limited use cases and will give you
unexpected results in several cases.

There are a few cases where gtod is still a little slow, but these
are being addressed. In many cases it is fast. In some hardware
it stays slow, but there is not much that can be done about that
because of the hardware design.

It works fine from interrupt handlers and other strange contexts. 

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

[EMAIL PROTECTED] (Frank Ch. Eigler) writes:
 
 We in systemtap land have the same problem, and so far made do with
 slightly postprocessed per-cpu TSC values.

90+% likely you're not solving your problem correctly this way.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote:
On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
I've been working on a patch set (below), to expose the clocksources
used by generic time to multiple users . It would allow timestamps from
different clocks in a generic way. It's not merged, but I'd appreciate
any input either of you might have..

ftp://source.mvista.com/pub/dwalker/clocksource/

Is it possible to see the resulting clocksource.h and maybe
clocksource.c after the patch set? That would make looking at it much
easier.

Well you could just apply the patch set, but I stuck them in the same
directory as above .. I'll delete them in 24 hours or so ..

At one point I replaced sched_clock() ,

ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch

The API is similar to that version, and sched_clock was the simplest
user of the API that I've done.

Ok, so it would basically be:

init()
{
clocksource *clock = clocksource_get_best_clock();
}

trace()
{
trace.time = cyc2ns(clock, clocksource_read(clock));
}

This seems pretty sane to me.

The blk-trace code calibrates a per cpu offset for the sched_clock()
time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does
the clocksource stuff help me with this or would I still need to do
something like that?

I also noticed that you have a clocksource_get_masked_clock() call. This
seems like a pretty awkward API to me. The first thing that came to mind
when I read the name was 'what is a masked clock'. When I realized that
it meant 'a clock w/o this flag', I still found it awkward that one has
to specify what that don't want. e.g 'I don't want a clock that is not
continuous.'

It think it would be better if you had sometime like
'clocksource_get_clock_with_features()' that took flags describing the
needed characteristics instead of the unwanted ones.

e.g.
clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
or
clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)

instead of:

clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)

Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
anyway to specify that I want a 64bit timer, only a way to specify that
I don't.

It also isn't clear what the implications of some of the flags are:
e.g:
NOT_CONTINUOUS - don't really have any idea what this means.
UNSTABLE - this means the frequency can change right?
Does PM_AFFECTED imply UNSTABLE?
NOT_ATOMIC - does this affect me as user?
PM_AFFECTED - it looks like the stp code deals with cpu speed
changing. Does the clocksource code do this for me with
cyc2ns? If it does are there any reason I would want to
avoid PM_AFFECTED clocks? If it doesn't how do I know
that I need to correct it myself.

Hope this helps,

-Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 09:02:44PM +0100, Andi Kleen wrote:
 Jeff Muizelaar [EMAIL PROTECTED] writes:
  
  The question is, what api should I be using? I need something that can
  be called from inside interrupt handlers, and obviously the more
  accurate and the lower the overhead the better.
 
 Use do_gettimeofday(). sched_clock() is not for general use
 and only for some very limited use cases and will give you
 unexpected results in several cases.
 
 There are a few cases where gtod is still a little slow, but these
 are being addressed. In many cases it is fast. In some hardware
 it stays slow, but there is not much that can be done about that
 because of the hardware design.
 
 It works fine from interrupt handlers and other strange contexts.

Ok, I'll probably use it then. How does the overhead of calling
do_gettimteofday() compare to doing an mmio read/write over PCI express?
i.e. is it going to be a performance problem if I call do_gettimeofday
for every mmio read/write?

Also, is there any good reason blk-trace doesn't use do_gettimeofday()? 

-Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 10:44:15AM -0800, Randy Dunlap wrote:
 On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote:
 
  On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
   Jeff Muizelaar [EMAIL PROTECTED] writes:
   
I've built a tool with the goal of logging mmio writes and reads by
device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
   
   FWIW, this is exactly a type of add-on trace patch that could be
   mooted by adoption of the ltt/systemtap marker facility.  With it,
   you would not need so much code (e.g. no new user-space tools at all,
   reuse of common tracing buffer logic, permanently placed hooks) and
   would probably get more utility.
  
  Is there more information on this marker facility? e.g. what is a
  marker? Are they just like tracepoints?
 
 On lkml for the past few days/months.
 Look for Linux Kernel Markers in the subject line.
 
 Mathieu, do you have a web site for LK Markers?
 

Yeah, so if I understand correctly, markers are basically compile time locations
that you can attach function calls to at run time, right?. If so, I
don't think they are of much use to me.

What I am doing is page-faulting on every read or write to an mmio
region, decoding the faulting instruction and passing the decoded
information up to userspace through relayfs.

-Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote:
On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote:
On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
I've been working on a patch set (below), to expose the clocksources
used by generic time to multiple users . It would allow timestamps from
different clocks in a generic way. It's not merged, but I'd appreciate
any input either of you might have..

ftp://source.mvista.com/pub/dwalker/clocksource/

Is it possible to see the resulting clocksource.h and maybe
clocksource.c after the patch set? That would make looking at it much
easier.

Well you could just apply the patch set, but I stuck them in the same
directory as above .. I'll delete them in 24 hours or so ..

At one point I replaced sched_clock() ,

ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch

The API is similar to that version, and sched_clock was the simplest
user of the API that I've done.

Ok, so it would basically be:

init()
{
clocksource *clock = clocksource_get_best_clock();
}

trace()
{
trace.time = cyc2ns(clock, clocksource_read(clock));
}

This seems pretty sane to me.

sched_clock has the down side of not taking into account rollover .. The
scheduler doesn't care if it gets a few bad timestamps now and then, but
that might screw up other code..

Also you could read cycles , and then later convert to nanoseconds when
it's needed .. Which makes taking the timestamps a little faster.

Most of the clocksources are not per cpu .. But the tsc clocksource
doesn't keep any per cpu offset information. There has been some work on
syncing the TSC across cpus so I'm not sure to what extend an offset
would be needed.

yeah it is a little strange .

It think it would be better if you had sometime like
'clocksource_get_clock_with_features()' that took flags describing the
needed characteristics instead of the unwanted ones.

e.g.
clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
or
clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)

One reason I did it the other way is because it's easier to specify what
you know you don't want, than to specify all the things that you know
you do want.

Almost everyone would want every flags listed,

clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
|CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

Gets pretty ugly .. The clocksource interface already has a positive
rating to describe the best clocks in the system, which is used to
return the best clock .. Where the maintainers of the system give each
clock a rating. I would imagine most people would just get the so called
best clock which has the best rating..

I'm starting to think this long flags stringing effect could happen with
negative flags also, but it's seems a lot less likely.

instead of

clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)

Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
anyway to specify that I want a 64bit timer, only a way to specify that
I don't.

I might add a way to get specific flags, but I still think the flags
should be mostly negative features.

It also isn't clear what the implications of some of the flags are:
e.g:
NOT_CONTINUOUS - don't really have any idea what this means.

It means the clock uses an interrupt to extend it's precision, or it's
not a real cycle counter and depends only on an interrupt for timing.

UNSTABLE - this means the frequency can change right?
Does PM_AFFECTED imply UNSTABLE?

PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE
means it's already become unstable.

NOT_ATOMIC - does this affect me as user?

It could .. The PIT clock for example takes a spinlock. Some users might
not want to use that clock cause they call from a context where that
isn't acceptable (LTT for example). It's also slower to read from.

PM_AFFECTED - it looks like the stp code deals with cpu speed
changing. Does the clocksource code do this for me with
cyc2ns? If it does are there any reason I would want to

Re: Using sched_clock for mmio-trace

Hi -

On Fri, Feb 16, 2007 at 09:03:23PM +0100, Andi Kleen wrote:
  We in systemtap land have the same problem, and so far made do with
  slightly postprocessed per-cpu TSC values.
 
 90+% likely you're not solving your problem correctly this way.

Yes, it was done as a last resort.

We need facility that we can call from even more demanding contexts
than interrupt handlers, considering that kprobes can be placed nearly
anywhere.  This is one of the reasons why we don't just use good old
do_gettimeofday(), since it takes locks and can lead to lock recursion
if parts of itself are probed.

- FChE
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, Feb 16, 2007 at 01:06:19PM -0800, Daniel Walker wrote:
 On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote:
  It think it would be better if you had sometime like
  'clocksource_get_clock_with_features()' that took flags describing the
  needed characteristics instead of the unwanted ones.
  
  e.g.
  clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
  or
  clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)
  
 
 One reason I did it the other way is because it's easier to specify what
 you know you don't want, than to specify all the things that you know
 you do want.
 
 Almost everyone would want every flags listed,
 
 clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
 |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

I still meant for _with_features to have same semantics so calling:

clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
 |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

would be equivalent to calling:

clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE
 |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS);

The only difference is that the naming is reversed. i.e. it isn't a
doulbe negative.

perhaps a better name would clocksource_get_clock_must_have() or
clocksource_get_clock_must_be()
 
 Gets pretty ugly .. The clocksource interface already has a positive
 rating to describe the best clocks in the system, which is used to
 return the best clock .. Where the maintainers of the system give each
 clock a rating. I would imagine most people would just get the so called
 best clock which has the best rating..
 
 I'm starting to think this long flags stringing effect could happen with
 negative flags also, but it's seems a lot less likely.

The amount of flag stringing should be the same.

 
  instead of
  
  clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
  clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
  
  Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
  anyway to specify that I want a 64bit timer, only a way to specify that
  I don't.
 
 I might add a way to get specific flags, but I still think the flags
 should be mostly negative features.

Yeah, the problem is that all of the features are negative except for
CLOCKSOURCE_64BIT, so you can't mask for it.

  It also isn't clear what the implications of some of the flags are:
  e.g:
  NOT_CONTINUOUS - don't really have any idea what this means.
 
 It means the clock uses an interrupt to extend it's precision, or it's
 not a real cycle counter and depends only on an interrupt for timing.
 
  UNSTABLE - this means the frequency can change right?
  Does PM_AFFECTED imply UNSTABLE?
 
 PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE
 means it's already become unstable.
 
  NOT_ATOMIC - does this affect me as user?
 
 It could .. The PIT clock for example takes a spinlock. Some users might
 not want to use that clock cause they call from a context where that
 isn't acceptable (LTT for example). It's also slower to read from.
 
  PM_AFFECTED - it looks like the stp code deals with cpu speed
  changing. Does the clocksource code do this for me with
  cyc2ns?  If it does are there any reason I would want to
  avoid PM_AFFECTED clocks? If it doesn't how do I know
  that I need to correct it myself.
 
 There is a block notification system that lets you know when a clock
 becomes unstable . cyc2ns doesn't compensate for frequency changes . The
 TSC for example could fluctuate frequency pretty often. sched_clock for
 example uses the clock no matter what the state is , which is why I
 leave unstable clock in the list.

It might be good if something like these explanations could be added as
comments beside the feature flags.

So, if I was a systemtap style user of the clocksource api, I'd still have to
do something like:

init() {
// assume this gives me the tsc
clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC);
register a cpu_freq notifier
}

trace() {
trace.time =

compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock))
}

In this case it doesn't look like using the clocksource stuff helps this
style of user much as all that is abstracted away is really the reading
the tsc. On the other hand it sounds like if you want all this stuff
taken care of for you, most people should be using do_gettimeofday()
instead.

-Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace

On Fri, 2007-02-16 at 17:10 -0500, Jeff Muizelaar wrote:

 
 I still meant for _with_features to have same semantics so calling:
 
 clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
  |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);
 
 would be equivalent to calling:
 
 clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE
  |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS);
 
 The only difference is that the naming is reversed. i.e. it isn't a
 doulbe negative.

Yes, I assumed that ..

 perhaps a better name would clocksource_get_clock_must_have() or
 clocksource_get_clock_must_be()
  
  Gets pretty ugly .. The clocksource interface already has a positive
  rating to describe the best clocks in the system, which is used to
  return the best clock .. Where the maintainers of the system give each
  clock a rating. I would imagine most people would just get the so called
  best clock which has the best rating..
  
  I'm starting to think this long flags stringing effect could happen with
  negative flags also, but it's seems a lot less likely.
 
 The amount of flag stringing should be the same.

I don't think so .. The common case with negative flags is no flags,
then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two
flags .. The other direction your likely to have people using all flags
most of the time. That's why I showed a function call with all the flags
listed.

To be clear the clocksources are already sorted with a rating (no flags
involved). 

So for example, On i386 the clocks are sorts as follows,

tsc (300)
acpi_pm (200)
pit (110)
jiffies (1)

So the bulk of the positive information is already there for you. The
tsc clock is by far the best in the system if it's stable. If you ignore
the flags, then you will always be given the tsc .

   instead of
   
   clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
   clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
   
   Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
   anyway to specify that I want a 64bit timer, only a way to specify that
   I don't.
  
  I might add a way to get specific flags, but I still think the flags
  should be mostly negative features.
 
 Yeah, the problem is that all of the features are negative except for
 CLOCKSOURCE_64BIT, so you can't mask for it.

It's meant as a negative feature. So you can mask it if you can't handle
the math .. The only 64bit clock I know off is the tsc, and it's got the
highest rating of all clocks.


 It might be good if something like these explanations could be added as
 comments beside the feature flags.

Ok .. On my TODO list ..

 So, if I was a systemtap style user of the clocksource api, I'd still have to
 do something like:
 
 init() {
   // assume this gives me the tsc
   clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC);
   register a cpu_freq notifier
 }
 
 trace() {
   trace.time =
   
 compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock))
 }
 
 In this case it doesn't look like using the clocksource stuff helps this
 style of user much as all that is abstracted away is really the reading
 the tsc. On the other hand it sounds like if you want all this stuff
 taken care of for you, most people should be using do_gettimeofday()
 instead.

Instead of using an unstable clock and trying to compensate for it, the
solution is not to use unstable clocks. The clocksource stuff does help
with that.. You can either use the tsc until it becomes unstable, or use
the acpi_pm clock which never becomes unstable. Both are used in the
same way.

 init() {
  // gives you the acpi_pm
  clock = 
clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_NOT_ATOMIC);
   register a cpu_freq notifier
 }
 
 trace() {
   trace.time =
  cyc2ns(clock, clocksource_read(clock))
 }
 


Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Using sched_clock for mmio-trace