Re: Using sched_clock for mmio-trace
On Sat, Feb 17, 2007 at 04:19:58PM +0100, Thomas Gleixner wrote: > On Sat, 2007-02-17 at 15:56 +0100, Andi Kleen wrote: > > > This is one of the reasons why we don't just use good old > > > do_gettimeofday(), since it takes locks and can lead to lock recursion > > > if parts of itself are probed. > > > > do_gettimeofday doesn't take locks. > > > > Only restriction is that you can't single step it with long > > pauses between instructions. > > Err, it uses read side of xtime lock, so you can not call it from a > place which write locks xtime lock. Err, you can -- seqlocks never deadlock. The only thing that doesn't work is to single step with long enough pauses with interrupts on inbetween that the sequence numbers increase: you get a livelock then. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Sat, Feb 17, 2007 at 04:19:58PM +0100, Thomas Gleixner wrote: On Sat, 2007-02-17 at 15:56 +0100, Andi Kleen wrote: This is one of the reasons why we don't just use good old do_gettimeofday(), since it takes locks and can lead to lock recursion if parts of itself are probed. do_gettimeofday doesn't take locks. Only restriction is that you can't single step it with long pauses between instructions. Err, it uses read side of xtime lock, so you can not call it from a place which write locks xtime lock. Err, you can -- seqlocks never deadlock. The only thing that doesn't work is to single step with long enough pauses with interrupts on inbetween that the sequence numbers increase: you get a livelock then. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Sat, 2007-02-17 at 15:56 +0100, Andi Kleen wrote: > > This is one of the reasons why we don't just use good old > > do_gettimeofday(), since it takes locks and can lead to lock recursion > > if parts of itself are probed. > > do_gettimeofday doesn't take locks. > > Only restriction is that you can't single step it with long > pauses between instructions. Err, it uses read side of xtime lock, so you can not call it from a place which write locks xtime lock. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
> This is one of the reasons why we don't just use good old > do_gettimeofday(), since it takes locks and can lead to lock recursion > if parts of itself are probed. do_gettimeofday doesn't take locks. Only restriction is that you can't single step it with long pauses between instructions. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
This is one of the reasons why we don't just use good old do_gettimeofday(), since it takes locks and can lead to lock recursion if parts of itself are probed. do_gettimeofday doesn't take locks. Only restriction is that you can't single step it with long pauses between instructions. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Sat, 2007-02-17 at 15:56 +0100, Andi Kleen wrote: This is one of the reasons why we don't just use good old do_gettimeofday(), since it takes locks and can lead to lock recursion if parts of itself are probed. do_gettimeofday doesn't take locks. Only restriction is that you can't single step it with long pauses between instructions. Err, it uses read side of xtime lock, so you can not call it from a place which write locks xtime lock. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 02:47:45PM -0800, Daniel Walker wrote: > > > Gets pretty ugly .. The clocksource interface already has a positive > > > rating to describe the "best" clocks in the system, which is used to > > > return the "best" clock .. Where the maintainers of the system give each > > > clock a rating. I would imagine most people would just get the so called > > > "best" clock which has the best rating.. > > > > > > I'm starting to think this long flags stringing effect could happen with > > > negative flags also, but it's seems a lot less likely. > > > > The amount of flag stringing should be the same. > > I don't think so .. The common case with negative flags is no flags, > then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two > flags .. The other direction your likely to have people using all flags > most of the time. That's why I showed a function call with all the flags > listed. I think you still misunderstand me. The common case is still no flags. clocksource_get_clock_must_have(0) would return clocks that are stable and unstable. clocksource_get_clock_must_have(CLOCKSOURCE_STABLE) would only return clocks that are stable, just like clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) only returns clocks that are stable. > > > > instead of > > > > > > > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) > > > > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) > > > > > > > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really > > > > anyway to specify that I want a 64bit timer, only a way to specify that > > > > I don't. > > > > > > I might add a way to get specific flags, but I still think the flags > > > should be mostly negative features. > > > > Yeah, the problem is that all of the features are negative except for > > CLOCKSOURCE_64BIT, so you can't mask for it. > > It's meant as a negative feature. So you can mask it if you can't handle > the math .. The only 64bit clock I know off is the tsc, and it's got the > highest rating of all clocks. Ah ok, I see that now. Maybe CLOCKSOURCE_OVER_32BITS would be a better name? It might convey the negativity better... -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 17:10 -0500, Jeff Muizelaar wrote: > > I still meant for _with_features to have same semantics so calling: > > clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE > |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); > > would be equivalent to calling: > > clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE > |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS); > > The only difference is that the naming is reversed. i.e. it isn't a > doulbe negative. Yes, I assumed that .. > perhaps a better name would clocksource_get_clock_must_have() or > clocksource_get_clock_must_be() > > > Gets pretty ugly .. The clocksource interface already has a positive > > rating to describe the "best" clocks in the system, which is used to > > return the "best" clock .. Where the maintainers of the system give each > > clock a rating. I would imagine most people would just get the so called > > "best" clock which has the best rating.. > > > > I'm starting to think this long flags stringing effect could happen with > > negative flags also, but it's seems a lot less likely. > > The amount of flag stringing should be the same. I don't think so .. The common case with negative flags is no flags, then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two flags .. The other direction your likely to have people using all flags most of the time. That's why I showed a function call with all the flags listed. To be clear the clocksources are already sorted with a rating (no flags involved). So for example, On i386 the clocks are sorts as follows, tsc (300) acpi_pm (200) pit (110) jiffies (1) So the bulk of the positive information is already there for you. The tsc clock is by far the best in the system if it's stable. If you ignore the flags, then you will always be given the tsc . > > > instead of > > > > > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) > > > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) > > > > > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really > > > anyway to specify that I want a 64bit timer, only a way to specify that > > > I don't. > > > > I might add a way to get specific flags, but I still think the flags > > should be mostly negative features. > > Yeah, the problem is that all of the features are negative except for > CLOCKSOURCE_64BIT, so you can't mask for it. It's meant as a negative feature. So you can mask it if you can't handle the math .. The only 64bit clock I know off is the tsc, and it's got the highest rating of all clocks. > It might be good if something like these explanations could be added as > comments beside the feature flags. Ok .. On my TODO list .. > So, if I was a systemtap style user of the clocksource api, I'd still have to > do something like: > > init() { > // assume this gives me the tsc > clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC); > register a cpu_freq notifier > } > > trace() { > trace.time = > > compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock)) > } > > In this case it doesn't look like using the clocksource stuff helps this > style of user much as all that is abstracted away is really the reading > the tsc. On the other hand it sounds like if you want all this stuff > taken care of for you, most people should be using do_gettimeofday() > instead. Instead of using an unstable clock and trying to compensate for it, the solution is not to use unstable clocks. The clocksource stuff does help with that.. You can either use the tsc until it becomes unstable, or use the acpi_pm clock which never becomes unstable. Both are used in the same way. > init() { // gives you the acpi_pm clock = clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_NOT_ATOMIC); > register a cpu_freq notifier > } > > trace() { > trace.time = cyc2ns(clock, clocksource_read(clock)) > } > Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 01:06:19PM -0800, Daniel Walker wrote: > On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote: > > It think it would be better if you had sometime like > > 'clocksource_get_clock_with_features()' that took flags describing the > > needed characteristics instead of the unwanted ones. > > > > e.g. > > clocksource_get_clock_with_features(CLOCKSOURCE_STABLE) > > or > > clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED) > > > > One reason I did it the other way is because it's easier to specify what > you know you don't want, than to specify all the things that you know > you do want. > > Almost everyone would want every flags listed, > > clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE > |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); I still meant for _with_features to have same semantics so calling: clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); would be equivalent to calling: clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS); The only difference is that the naming is reversed. i.e. it isn't a doulbe negative. perhaps a better name would clocksource_get_clock_must_have() or clocksource_get_clock_must_be() > Gets pretty ugly .. The clocksource interface already has a positive > rating to describe the "best" clocks in the system, which is used to > return the "best" clock .. Where the maintainers of the system give each > clock a rating. I would imagine most people would just get the so called > "best" clock which has the best rating.. > > I'm starting to think this long flags stringing effect could happen with > negative flags also, but it's seems a lot less likely. The amount of flag stringing should be the same. > > > instead of > > > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) > > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) > > > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really > > anyway to specify that I want a 64bit timer, only a way to specify that > > I don't. > > I might add a way to get specific flags, but I still think the flags > should be mostly negative features. Yeah, the problem is that all of the features are negative except for CLOCKSOURCE_64BIT, so you can't mask for it. > > It also isn't clear what the implications of some of the flags are: > > e.g: > > NOT_CONTINUOUS - don't really have any idea what this means. > > It means the clock uses an interrupt to extend it's precision, or it's > not a real cycle counter and depends only on an interrupt for timing. > > > UNSTABLE - this means the frequency can change right? > > Does PM_AFFECTED imply UNSTABLE? > > PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE > means it's already become unstable. > > > NOT_ATOMIC - does this affect me as user? > > It could .. The PIT clock for example takes a spinlock. Some users might > not want to use that clock cause they call from a context where that > isn't acceptable (LTT for example). It's also slower to read from. > > > PM_AFFECTED - it looks like the stp code deals with cpu speed > > changing. Does the clocksource code do this for me with > > cyc2ns? If it does are there any reason I would want to > > avoid PM_AFFECTED clocks? If it doesn't how do I know > > that I need to correct it myself. > > There is a block notification system that lets you know when a clock > becomes unstable . cyc2ns doesn't compensate for frequency changes . The > TSC for example could fluctuate frequency pretty often. sched_clock for > example uses the clock no matter what the state is , which is why I > leave unstable clock in the list. It might be good if something like these explanations could be added as comments beside the feature flags. So, if I was a systemtap style user of the clocksource api, I'd still have to do something like: init() { // assume this gives me the tsc clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC); register a cpu_freq notifier } trace() { trace.time = compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock)) } In this case it doesn't look like using the clocksource stuff helps this style of user much as all that is abstracted away is really the reading the tsc. On the other hand it sounds like if you want all this stuff taken care of for you, most people should be using do_gettimeofday() instead. -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
Hi - On Fri, Feb 16, 2007 at 09:03:23PM +0100, Andi Kleen wrote: > > We in systemtap land have the same problem, and so far made do with > > slightly postprocessed per-cpu TSC values. > > 90+% likely you're not solving your problem correctly this way. Yes, it was done as a last resort. We need facility that we can call from even more demanding contexts than interrupt handlers, considering that kprobes can be placed nearly anywhere. This is one of the reasons why we don't just use good old do_gettimeofday(), since it takes locks and can lead to lock recursion if parts of itself are probed. - FChE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote: > On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote: > > On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote: > > > On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: > > > > I've been working on a patch set (below), to expose the clocksources > > > > used by generic time to multiple users . It would allow timestamps from > > > > different clocks in a generic way. It's not merged, but I'd appreciate > > > > any input either of you might have.. > > > > > > > > ftp://source.mvista.com/pub/dwalker/clocksource/ > > > > > > Is it possible to see the resulting clocksource.h and maybe > > > clocksource.c after the patch set? That would make looking at it much > > > easier. > > > > Well you could just apply the patch set, but I stuck them in the same > > directory as above .. I'll delete them in 24 hours or so .. > > > > At one point I replaced sched_clock() , > > > > ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch > > > > The API is similar to that version, and sched_clock was the simplest > > user of the API that I've done. > > Ok, so it would basically be: > > init() > { > clocksource *clock = clocksource_get_best_clock(); > } > > trace() > { > trace.time = cyc2ns(clock, clocksource_read(clock)); > } > > This seems pretty sane to me. sched_clock has the down side of not taking into account rollover .. The scheduler doesn't care if it gets a few bad timestamps now and then, but that might screw up other code.. Also you could read cycles , and then later convert to nanoseconds when it's needed .. Which makes taking the timestamps a little faster. > The blk-trace code calibrates a per cpu offset for the sched_clock() > time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does > the clocksource stuff help me with this or would I still need to do > something like that? Most of the clocksources are not per cpu .. But the tsc clocksource doesn't keep any per cpu offset information. There has been some work on syncing the TSC across cpus so I'm not sure to what extend an offset would be needed. > I also noticed that you have a clocksource_get_masked_clock() call. This > seems like a pretty awkward API to me. The first thing that came to mind > when I read the name was 'what is a masked clock'. When I realized that > it meant 'a clock w/o this flag', I still found it awkward that one has > to specify what that don't want. e.g 'I don't want a clock that is not > continuous.' yeah it is a little strange . > It think it would be better if you had sometime like > 'clocksource_get_clock_with_features()' that took flags describing the > needed characteristics instead of the unwanted ones. > > e.g. > clocksource_get_clock_with_features(CLOCKSOURCE_STABLE) > or > clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED) > One reason I did it the other way is because it's easier to specify what you know you don't want, than to specify all the things that you know you do want. Almost everyone would want every flags listed, clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); Gets pretty ugly .. The clocksource interface already has a positive rating to describe the "best" clocks in the system, which is used to return the "best" clock .. Where the maintainers of the system give each clock a rating. I would imagine most people would just get the so called "best" clock which has the best rating.. I'm starting to think this long flags stringing effect could happen with negative flags also, but it's seems a lot less likely. > instead of > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really > anyway to specify that I want a 64bit timer, only a way to specify that > I don't. I might add a way to get specific flags, but I still think the flags should be mostly negative features. > It also isn't clear what the implications of some of the flags are: > e.g: > NOT_CONTINUOUS - don't really have any idea what this means. It means the clock uses an interrupt to extend it's precision, or it's not a real cycle counter and depends only on an interrupt for timing. > UNSTABLE - this means the frequency can change right? > Does PM_AFFECTED imply UNSTABLE? PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE means it's already become unstable. > NOT_ATOMIC - does this affect me as user? It could .. The PIT clock for example takes a spinlock. Some users might not want to use that clock cause they call from a context where that isn't acceptable (LTT for example). It's also slower to read from. > PM_AFFECTED - it looks like the stp code deals with cpu speed > changing.
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 10:44:15AM -0800, Randy Dunlap wrote: > On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote: > > > On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote: > > > Jeff Muizelaar <[EMAIL PROTECTED]> writes: > > > > > > > I've built a tool with the goal of logging mmio writes and reads by > > > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. > > > > > > FWIW, this is exactly a type of add-on trace patch that could be > > > mooted by adoption of the ltt/systemtap "marker" facility. With it, > > > you would not need so much code (e.g. no new user-space tools at all, > > > reuse of common tracing buffer logic, permanently placed hooks) and > > > would probably get more utility. > > > > Is there more information on this "marker" facility? e.g. what is a > > marker? Are they just like tracepoints? > > On lkml for the past few days/months. > Look for "Linux Kernel Markers" in the subject line. > > Mathieu, do you have a web site for LK Markers? > Yeah, so if I understand correctly, markers are basically compile time locations that you can attach function calls to at run time, right?. If so, I don't think they are of much use to me. What I am doing is page-faulting on every read or write to an mmio region, decoding the faulting instruction and passing the decoded information up to userspace through relayfs. -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 09:02:44PM +0100, Andi Kleen wrote: > Jeff Muizelaar <[EMAIL PROTECTED]> writes: > > > > The question is, what api should I be using? I need something that can > > be called from inside interrupt handlers, and obviously the more > > accurate and the lower the overhead the better. > > Use do_gettimeofday(). sched_clock() is not for general use > and only for some very limited use cases and will give you > unexpected results in several cases. > > There are a few cases where gtod is still a little slow, but these > are being addressed. In many cases it is fast. In some hardware > it stays slow, but there is not much that can be done about that > because of the hardware design. > > It works fine from interrupt handlers and other strange contexts. Ok, I'll probably use it then. How does the overhead of calling do_gettimteofday() compare to doing an mmio read/write over PCI express? i.e. is it going to be a performance problem if I call do_gettimeofday for every mmio read/write? Also, is there any good reason blk-trace doesn't use do_gettimeofday()? -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote: > On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote: > > On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: > > > I've been working on a patch set (below), to expose the clocksources > > > used by generic time to multiple users . It would allow timestamps from > > > different clocks in a generic way. It's not merged, but I'd appreciate > > > any input either of you might have.. > > > > > > ftp://source.mvista.com/pub/dwalker/clocksource/ > > > > Is it possible to see the resulting clocksource.h and maybe > > clocksource.c after the patch set? That would make looking at it much > > easier. > > Well you could just apply the patch set, but I stuck them in the same > directory as above .. I'll delete them in 24 hours or so .. > > At one point I replaced sched_clock() , > > ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch > > The API is similar to that version, and sched_clock was the simplest > user of the API that I've done. Ok, so it would basically be: init() { clocksource *clock = clocksource_get_best_clock(); } trace() { trace.time = cyc2ns(clock, clocksource_read(clock)); } This seems pretty sane to me. The blk-trace code calibrates a per cpu offset for the sched_clock() time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does the clocksource stuff help me with this or would I still need to do something like that? I also noticed that you have a clocksource_get_masked_clock() call. This seems like a pretty awkward API to me. The first thing that came to mind when I read the name was 'what is a masked clock'. When I realized that it meant 'a clock w/o this flag', I still found it awkward that one has to specify what that don't want. e.g 'I don't want a clock that is not continuous.' It think it would be better if you had sometime like 'clocksource_get_clock_with_features()' that took flags describing the needed characteristics instead of the unwanted ones. e.g. clocksource_get_clock_with_features(CLOCKSOURCE_STABLE) or clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED) instead of: clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really anyway to specify that I want a 64bit timer, only a way to specify that I don't. It also isn't clear what the implications of some of the flags are: e.g: NOT_CONTINUOUS - don't really have any idea what this means. UNSTABLE - this means the frequency can change right? Does PM_AFFECTED imply UNSTABLE? NOT_ATOMIC - does this affect me as user? PM_AFFECTED - it looks like the stp code deals with cpu speed changing. Does the clocksource code do this for me with cyc2ns? If it does are there any reason I would want to avoid PM_AFFECTED clocks? If it doesn't how do I know that I need to correct it myself. Hope this helps, -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
[EMAIL PROTECTED] (Frank Ch. Eigler) writes: > > We in systemtap land have the same problem, and so far made do with > slightly postprocessed per-cpu TSC values. 90+% likely you're not solving your problem correctly this way. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
Jeff Muizelaar <[EMAIL PROTECTED]> writes: > > The question is, what api should I be using? I need something that can > be called from inside interrupt handlers, and obviously the more > accurate and the lower the overhead the better. Use do_gettimeofday(). sched_clock() is not for general use and only for some very limited use cases and will give you unexpected results in several cases. There are a few cases where gtod is still a little slow, but these are being addressed. In many cases it is fast. In some hardware it stays slow, but there is not much that can be done about that because of the hardware design. It works fine from interrupt handlers and other strange contexts. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote: > On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote: > > Jeff Muizelaar <[EMAIL PROTECTED]> writes: > > > > > I've built a tool with the goal of logging mmio writes and reads by > > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. > > > > FWIW, this is exactly a type of add-on trace patch that could be > > mooted by adoption of the ltt/systemtap "marker" facility. With it, > > you would not need so much code (e.g. no new user-space tools at all, > > reuse of common tracing buffer logic, permanently placed hooks) and > > would probably get more utility. > > Is there more information on this "marker" facility? e.g. what is a > marker? Are they just like tracepoints? On lkml for the past few days/months. Look for "Linux Kernel Markers" in the subject line. Mathieu, do you have a web site for LK Markers? --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote: > On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: > > I've been working on a patch set (below), to expose the clocksources > > used by generic time to multiple users . It would allow timestamps from > > different clocks in a generic way. It's not merged, but I'd appreciate > > any input either of you might have.. > > > > ftp://source.mvista.com/pub/dwalker/clocksource/ > > Is it possible to see the resulting clocksource.h and maybe > clocksource.c after the patch set? That would make looking at it much > easier. Well you could just apply the patch set, but I stuck them in the same directory as above .. I'll delete them in 24 hours or so .. At one point I replaced sched_clock() , ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch The API is similar to that version, and sched_clock was the simplest user of the API that I've done. Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote: > Jeff Muizelaar <[EMAIL PROTECTED]> writes: > > > I've built a tool with the goal of logging mmio writes and reads by > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. > > FWIW, this is exactly a type of add-on trace patch that could be > mooted by adoption of the ltt/systemtap "marker" facility. With it, > you would not need so much code (e.g. no new user-space tools at all, > reuse of common tracing buffer logic, permanently placed hooks) and > would probably get more utility. Is there more information on this "marker" facility? e.g. what is a marker? Are they just like tracepoints? > > [...] The question is, what [timer] api should I be using? I need > > something that can be called from inside interrupt handlers, and > > obviously the more accurate and the lower the overhead the better. > > We in systemtap land have the same problem, and so far made do with > slightly postprocessed per-cpu TSC values. Thanks, I'll have a look at systemtap does. It's the stuff src/runtime/time.c right? Also, you might want to move this stuff from the README to someplace more prominent on the systemtap website. It's nice not to have to dig far to get the source... Download systemtap sources snapshot or from CVS: ftp://sources.redhat.com/pub/systemtap/snapshots/ (or) cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap login # enter "anoncvs" as the password cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap co src -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: > I've been working on a patch set (below), to expose the clocksources > used by generic time to multiple users . It would allow timestamps from > different clocks in a generic way. It's not merged, but I'd appreciate > any input either of you might have.. > > ftp://source.mvista.com/pub/dwalker/clocksource/ Is it possible to see the resulting clocksource.h and maybe clocksource.c after the patch set? That would make looking at it much easier. -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 11:30 -0500, Frank Ch. Eigler wrote: > Jeff Muizelaar <[EMAIL PROTECTED]> writes: > > > I've built a tool with the goal of logging mmio writes and reads by > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. > > FWIW, this is exactly a type of add-on trace patch that could be > mooted by adoption of the ltt/systemtap "marker" facility. With it, > you would not need so much code (e.g. no new user-space tools at all, > reuse of common tracing buffer logic, permanently placed hooks) and > would probably get more utility. > > > [...] The question is, what [timer] api should I be using? I need > > something that can be called from inside interrupt handlers, and > > obviously the more accurate and the lower the overhead the better. > > We in systemtap land have the same problem, and so far made do with > slightly postprocessed per-cpu TSC values. I've been working on a patch set (below), to expose the clocksources used by generic time to multiple users . It would allow timestamps from different clocks in a generic way. It's not merged, but I'd appreciate any input either of you might have.. ftp://source.mvista.com/pub/dwalker/clocksource/ Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
Jeff Muizelaar <[EMAIL PROTECTED]> writes: > I've built a tool with the goal of logging mmio writes and reads by > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. FWIW, this is exactly a type of add-on trace patch that could be mooted by adoption of the ltt/systemtap "marker" facility. With it, you would not need so much code (e.g. no new user-space tools at all, reuse of common tracing buffer logic, permanently placed hooks) and would probably get more utility. > [...] The question is, what [timer] api should I be using? I need > something that can be called from inside interrupt handlers, and > obviously the more accurate and the lower the overhead the better. We in systemtap land have the same problem, and so far made do with slightly postprocessed per-cpu TSC values. - FChE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
Jeff Muizelaar [EMAIL PROTECTED] writes: I've built a tool with the goal of logging mmio writes and reads by device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. FWIW, this is exactly a type of add-on trace patch that could be mooted by adoption of the ltt/systemtap marker facility. With it, you would not need so much code (e.g. no new user-space tools at all, reuse of common tracing buffer logic, permanently placed hooks) and would probably get more utility. [...] The question is, what [timer] api should I be using? I need something that can be called from inside interrupt handlers, and obviously the more accurate and the lower the overhead the better. We in systemtap land have the same problem, and so far made do with slightly postprocessed per-cpu TSC values. - FChE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 11:30 -0500, Frank Ch. Eigler wrote: Jeff Muizelaar [EMAIL PROTECTED] writes: I've built a tool with the goal of logging mmio writes and reads by device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. FWIW, this is exactly a type of add-on trace patch that could be mooted by adoption of the ltt/systemtap marker facility. With it, you would not need so much code (e.g. no new user-space tools at all, reuse of common tracing buffer logic, permanently placed hooks) and would probably get more utility. [...] The question is, what [timer] api should I be using? I need something that can be called from inside interrupt handlers, and obviously the more accurate and the lower the overhead the better. We in systemtap land have the same problem, and so far made do with slightly postprocessed per-cpu TSC values. I've been working on a patch set (below), to expose the clocksources used by generic time to multiple users . It would allow timestamps from different clocks in a generic way. It's not merged, but I'd appreciate any input either of you might have.. ftp://source.mvista.com/pub/dwalker/clocksource/ Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: I've been working on a patch set (below), to expose the clocksources used by generic time to multiple users . It would allow timestamps from different clocks in a generic way. It's not merged, but I'd appreciate any input either of you might have.. ftp://source.mvista.com/pub/dwalker/clocksource/ Is it possible to see the resulting clocksource.h and maybe clocksource.c after the patch set? That would make looking at it much easier. -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote: Jeff Muizelaar [EMAIL PROTECTED] writes: I've built a tool with the goal of logging mmio writes and reads by device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. FWIW, this is exactly a type of add-on trace patch that could be mooted by adoption of the ltt/systemtap marker facility. With it, you would not need so much code (e.g. no new user-space tools at all, reuse of common tracing buffer logic, permanently placed hooks) and would probably get more utility. Is there more information on this marker facility? e.g. what is a marker? Are they just like tracepoints? [...] The question is, what [timer] api should I be using? I need something that can be called from inside interrupt handlers, and obviously the more accurate and the lower the overhead the better. We in systemtap land have the same problem, and so far made do with slightly postprocessed per-cpu TSC values. Thanks, I'll have a look at systemtap does. It's the stuff src/runtime/time.c right? Also, you might want to move this stuff from the README to someplace more prominent on the systemtap website. It's nice not to have to dig far to get the source... Download systemtap sources snapshot or from CVS: ftp://sources.redhat.com/pub/systemtap/snapshots/ (or) cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap login # enter anoncvs as the password cvs -d :pserver:[EMAIL PROTECTED]:/cvs/systemtap co src -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote: On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: I've been working on a patch set (below), to expose the clocksources used by generic time to multiple users . It would allow timestamps from different clocks in a generic way. It's not merged, but I'd appreciate any input either of you might have.. ftp://source.mvista.com/pub/dwalker/clocksource/ Is it possible to see the resulting clocksource.h and maybe clocksource.c after the patch set? That would make looking at it much easier. Well you could just apply the patch set, but I stuck them in the same directory as above .. I'll delete them in 24 hours or so .. At one point I replaced sched_clock() , ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch The API is similar to that version, and sched_clock was the simplest user of the API that I've done. Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote: On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote: Jeff Muizelaar [EMAIL PROTECTED] writes: I've built a tool with the goal of logging mmio writes and reads by device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. FWIW, this is exactly a type of add-on trace patch that could be mooted by adoption of the ltt/systemtap marker facility. With it, you would not need so much code (e.g. no new user-space tools at all, reuse of common tracing buffer logic, permanently placed hooks) and would probably get more utility. Is there more information on this marker facility? e.g. what is a marker? Are they just like tracepoints? On lkml for the past few days/months. Look for Linux Kernel Markers in the subject line. Mathieu, do you have a web site for LK Markers? --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
Jeff Muizelaar [EMAIL PROTECTED] writes: The question is, what api should I be using? I need something that can be called from inside interrupt handlers, and obviously the more accurate and the lower the overhead the better. Use do_gettimeofday(). sched_clock() is not for general use and only for some very limited use cases and will give you unexpected results in several cases. There are a few cases where gtod is still a little slow, but these are being addressed. In many cases it is fast. In some hardware it stays slow, but there is not much that can be done about that because of the hardware design. It works fine from interrupt handlers and other strange contexts. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
[EMAIL PROTECTED] (Frank Ch. Eigler) writes: We in systemtap land have the same problem, and so far made do with slightly postprocessed per-cpu TSC values. 90+% likely you're not solving your problem correctly this way. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote: On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote: On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: I've been working on a patch set (below), to expose the clocksources used by generic time to multiple users . It would allow timestamps from different clocks in a generic way. It's not merged, but I'd appreciate any input either of you might have.. ftp://source.mvista.com/pub/dwalker/clocksource/ Is it possible to see the resulting clocksource.h and maybe clocksource.c after the patch set? That would make looking at it much easier. Well you could just apply the patch set, but I stuck them in the same directory as above .. I'll delete them in 24 hours or so .. At one point I replaced sched_clock() , ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch The API is similar to that version, and sched_clock was the simplest user of the API that I've done. Ok, so it would basically be: init() { clocksource *clock = clocksource_get_best_clock(); } trace() { trace.time = cyc2ns(clock, clocksource_read(clock)); } This seems pretty sane to me. The blk-trace code calibrates a per cpu offset for the sched_clock() time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does the clocksource stuff help me with this or would I still need to do something like that? I also noticed that you have a clocksource_get_masked_clock() call. This seems like a pretty awkward API to me. The first thing that came to mind when I read the name was 'what is a masked clock'. When I realized that it meant 'a clock w/o this flag', I still found it awkward that one has to specify what that don't want. e.g 'I don't want a clock that is not continuous.' It think it would be better if you had sometime like 'clocksource_get_clock_with_features()' that took flags describing the needed characteristics instead of the unwanted ones. e.g. clocksource_get_clock_with_features(CLOCKSOURCE_STABLE) or clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED) instead of: clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really anyway to specify that I want a 64bit timer, only a way to specify that I don't. It also isn't clear what the implications of some of the flags are: e.g: NOT_CONTINUOUS - don't really have any idea what this means. UNSTABLE - this means the frequency can change right? Does PM_AFFECTED imply UNSTABLE? NOT_ATOMIC - does this affect me as user? PM_AFFECTED - it looks like the stp code deals with cpu speed changing. Does the clocksource code do this for me with cyc2ns? If it does are there any reason I would want to avoid PM_AFFECTED clocks? If it doesn't how do I know that I need to correct it myself. Hope this helps, -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 09:02:44PM +0100, Andi Kleen wrote: Jeff Muizelaar [EMAIL PROTECTED] writes: The question is, what api should I be using? I need something that can be called from inside interrupt handlers, and obviously the more accurate and the lower the overhead the better. Use do_gettimeofday(). sched_clock() is not for general use and only for some very limited use cases and will give you unexpected results in several cases. There are a few cases where gtod is still a little slow, but these are being addressed. In many cases it is fast. In some hardware it stays slow, but there is not much that can be done about that because of the hardware design. It works fine from interrupt handlers and other strange contexts. Ok, I'll probably use it then. How does the overhead of calling do_gettimteofday() compare to doing an mmio read/write over PCI express? i.e. is it going to be a performance problem if I call do_gettimeofday for every mmio read/write? Also, is there any good reason blk-trace doesn't use do_gettimeofday()? -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 10:44:15AM -0800, Randy Dunlap wrote: On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote: On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote: Jeff Muizelaar [EMAIL PROTECTED] writes: I've built a tool with the goal of logging mmio writes and reads by device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. FWIW, this is exactly a type of add-on trace patch that could be mooted by adoption of the ltt/systemtap marker facility. With it, you would not need so much code (e.g. no new user-space tools at all, reuse of common tracing buffer logic, permanently placed hooks) and would probably get more utility. Is there more information on this marker facility? e.g. what is a marker? Are they just like tracepoints? On lkml for the past few days/months. Look for Linux Kernel Markers in the subject line. Mathieu, do you have a web site for LK Markers? Yeah, so if I understand correctly, markers are basically compile time locations that you can attach function calls to at run time, right?. If so, I don't think they are of much use to me. What I am doing is page-faulting on every read or write to an mmio region, decoding the faulting instruction and passing the decoded information up to userspace through relayfs. -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote: On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote: On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote: On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote: I've been working on a patch set (below), to expose the clocksources used by generic time to multiple users . It would allow timestamps from different clocks in a generic way. It's not merged, but I'd appreciate any input either of you might have.. ftp://source.mvista.com/pub/dwalker/clocksource/ Is it possible to see the resulting clocksource.h and maybe clocksource.c after the patch set? That would make looking at it much easier. Well you could just apply the patch set, but I stuck them in the same directory as above .. I'll delete them in 24 hours or so .. At one point I replaced sched_clock() , ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch The API is similar to that version, and sched_clock was the simplest user of the API that I've done. Ok, so it would basically be: init() { clocksource *clock = clocksource_get_best_clock(); } trace() { trace.time = cyc2ns(clock, clocksource_read(clock)); } This seems pretty sane to me. sched_clock has the down side of not taking into account rollover .. The scheduler doesn't care if it gets a few bad timestamps now and then, but that might screw up other code.. Also you could read cycles , and then later convert to nanoseconds when it's needed .. Which makes taking the timestamps a little faster. The blk-trace code calibrates a per cpu offset for the sched_clock() time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does the clocksource stuff help me with this or would I still need to do something like that? Most of the clocksources are not per cpu .. But the tsc clocksource doesn't keep any per cpu offset information. There has been some work on syncing the TSC across cpus so I'm not sure to what extend an offset would be needed. I also noticed that you have a clocksource_get_masked_clock() call. This seems like a pretty awkward API to me. The first thing that came to mind when I read the name was 'what is a masked clock'. When I realized that it meant 'a clock w/o this flag', I still found it awkward that one has to specify what that don't want. e.g 'I don't want a clock that is not continuous.' yeah it is a little strange . It think it would be better if you had sometime like 'clocksource_get_clock_with_features()' that took flags describing the needed characteristics instead of the unwanted ones. e.g. clocksource_get_clock_with_features(CLOCKSOURCE_STABLE) or clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED) One reason I did it the other way is because it's easier to specify what you know you don't want, than to specify all the things that you know you do want. Almost everyone would want every flags listed, clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); Gets pretty ugly .. The clocksource interface already has a positive rating to describe the best clocks in the system, which is used to return the best clock .. Where the maintainers of the system give each clock a rating. I would imagine most people would just get the so called best clock which has the best rating.. I'm starting to think this long flags stringing effect could happen with negative flags also, but it's seems a lot less likely. instead of clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really anyway to specify that I want a 64bit timer, only a way to specify that I don't. I might add a way to get specific flags, but I still think the flags should be mostly negative features. It also isn't clear what the implications of some of the flags are: e.g: NOT_CONTINUOUS - don't really have any idea what this means. It means the clock uses an interrupt to extend it's precision, or it's not a real cycle counter and depends only on an interrupt for timing. UNSTABLE - this means the frequency can change right? Does PM_AFFECTED imply UNSTABLE? PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE means it's already become unstable. NOT_ATOMIC - does this affect me as user? It could .. The PIT clock for example takes a spinlock. Some users might not want to use that clock cause they call from a context where that isn't acceptable (LTT for example). It's also slower to read from. PM_AFFECTED - it looks like the stp code deals with cpu speed changing. Does the clocksource code do this for me with cyc2ns? If it does are there any reason I would want to
Re: Using sched_clock for mmio-trace
Hi - On Fri, Feb 16, 2007 at 09:03:23PM +0100, Andi Kleen wrote: We in systemtap land have the same problem, and so far made do with slightly postprocessed per-cpu TSC values. 90+% likely you're not solving your problem correctly this way. Yes, it was done as a last resort. We need facility that we can call from even more demanding contexts than interrupt handlers, considering that kprobes can be placed nearly anywhere. This is one of the reasons why we don't just use good old do_gettimeofday(), since it takes locks and can lead to lock recursion if parts of itself are probed. - FChE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 01:06:19PM -0800, Daniel Walker wrote: On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote: It think it would be better if you had sometime like 'clocksource_get_clock_with_features()' that took flags describing the needed characteristics instead of the unwanted ones. e.g. clocksource_get_clock_with_features(CLOCKSOURCE_STABLE) or clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED) One reason I did it the other way is because it's easier to specify what you know you don't want, than to specify all the things that you know you do want. Almost everyone would want every flags listed, clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); I still meant for _with_features to have same semantics so calling: clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); would be equivalent to calling: clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS); The only difference is that the naming is reversed. i.e. it isn't a doulbe negative. perhaps a better name would clocksource_get_clock_must_have() or clocksource_get_clock_must_be() Gets pretty ugly .. The clocksource interface already has a positive rating to describe the best clocks in the system, which is used to return the best clock .. Where the maintainers of the system give each clock a rating. I would imagine most people would just get the so called best clock which has the best rating.. I'm starting to think this long flags stringing effect could happen with negative flags also, but it's seems a lot less likely. The amount of flag stringing should be the same. instead of clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really anyway to specify that I want a 64bit timer, only a way to specify that I don't. I might add a way to get specific flags, but I still think the flags should be mostly negative features. Yeah, the problem is that all of the features are negative except for CLOCKSOURCE_64BIT, so you can't mask for it. It also isn't clear what the implications of some of the flags are: e.g: NOT_CONTINUOUS - don't really have any idea what this means. It means the clock uses an interrupt to extend it's precision, or it's not a real cycle counter and depends only on an interrupt for timing. UNSTABLE - this means the frequency can change right? Does PM_AFFECTED imply UNSTABLE? PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE means it's already become unstable. NOT_ATOMIC - does this affect me as user? It could .. The PIT clock for example takes a spinlock. Some users might not want to use that clock cause they call from a context where that isn't acceptable (LTT for example). It's also slower to read from. PM_AFFECTED - it looks like the stp code deals with cpu speed changing. Does the clocksource code do this for me with cyc2ns? If it does are there any reason I would want to avoid PM_AFFECTED clocks? If it doesn't how do I know that I need to correct it myself. There is a block notification system that lets you know when a clock becomes unstable . cyc2ns doesn't compensate for frequency changes . The TSC for example could fluctuate frequency pretty often. sched_clock for example uses the clock no matter what the state is , which is why I leave unstable clock in the list. It might be good if something like these explanations could be added as comments beside the feature flags. So, if I was a systemtap style user of the clocksource api, I'd still have to do something like: init() { // assume this gives me the tsc clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC); register a cpu_freq notifier } trace() { trace.time = compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock)) } In this case it doesn't look like using the clocksource stuff helps this style of user much as all that is abstracted away is really the reading the tsc. On the other hand it sounds like if you want all this stuff taken care of for you, most people should be using do_gettimeofday() instead. -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, 2007-02-16 at 17:10 -0500, Jeff Muizelaar wrote: I still meant for _with_features to have same semantics so calling: clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS); would be equivalent to calling: clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS); The only difference is that the naming is reversed. i.e. it isn't a doulbe negative. Yes, I assumed that .. perhaps a better name would clocksource_get_clock_must_have() or clocksource_get_clock_must_be() Gets pretty ugly .. The clocksource interface already has a positive rating to describe the best clocks in the system, which is used to return the best clock .. Where the maintainers of the system give each clock a rating. I would imagine most people would just get the so called best clock which has the best rating.. I'm starting to think this long flags stringing effect could happen with negative flags also, but it's seems a lot less likely. The amount of flag stringing should be the same. I don't think so .. The common case with negative flags is no flags, then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two flags .. The other direction your likely to have people using all flags most of the time. That's why I showed a function call with all the flags listed. To be clear the clocksources are already sorted with a rating (no flags involved). So for example, On i386 the clocks are sorts as follows, tsc (300) acpi_pm (200) pit (110) jiffies (1) So the bulk of the positive information is already there for you. The tsc clock is by far the best in the system if it's stable. If you ignore the flags, then you will always be given the tsc . instead of clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really anyway to specify that I want a 64bit timer, only a way to specify that I don't. I might add a way to get specific flags, but I still think the flags should be mostly negative features. Yeah, the problem is that all of the features are negative except for CLOCKSOURCE_64BIT, so you can't mask for it. It's meant as a negative feature. So you can mask it if you can't handle the math .. The only 64bit clock I know off is the tsc, and it's got the highest rating of all clocks. It might be good if something like these explanations could be added as comments beside the feature flags. Ok .. On my TODO list .. So, if I was a systemtap style user of the clocksource api, I'd still have to do something like: init() { // assume this gives me the tsc clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC); register a cpu_freq notifier } trace() { trace.time = compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock)) } In this case it doesn't look like using the clocksource stuff helps this style of user much as all that is abstracted away is really the reading the tsc. On the other hand it sounds like if you want all this stuff taken care of for you, most people should be using do_gettimeofday() instead. Instead of using an unstable clock and trying to compensate for it, the solution is not to use unstable clocks. The clocksource stuff does help with that.. You can either use the tsc until it becomes unstable, or use the acpi_pm clock which never becomes unstable. Both are used in the same way. init() { // gives you the acpi_pm clock = clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_NOT_ATOMIC); register a cpu_freq notifier } trace() { trace.time = cyc2ns(clock, clocksource_read(clock)) } Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Using sched_clock for mmio-trace
On Fri, Feb 16, 2007 at 02:47:45PM -0800, Daniel Walker wrote: Gets pretty ugly .. The clocksource interface already has a positive rating to describe the best clocks in the system, which is used to return the best clock .. Where the maintainers of the system give each clock a rating. I would imagine most people would just get the so called best clock which has the best rating.. I'm starting to think this long flags stringing effect could happen with negative flags also, but it's seems a lot less likely. The amount of flag stringing should be the same. I don't think so .. The common case with negative flags is no flags, then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two flags .. The other direction your likely to have people using all flags most of the time. That's why I showed a function call with all the flags listed. I think you still misunderstand me. The common case is still no flags. clocksource_get_clock_must_have(0) would return clocks that are stable and unstable. clocksource_get_clock_must_have(CLOCKSOURCE_STABLE) would only return clocks that are stable, just like clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) only returns clocks that are stable. instead of clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED) Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really anyway to specify that I want a 64bit timer, only a way to specify that I don't. I might add a way to get specific flags, but I still think the flags should be mostly negative features. Yeah, the problem is that all of the features are negative except for CLOCKSOURCE_64BIT, so you can't mask for it. It's meant as a negative feature. So you can mask it if you can't handle the math .. The only 64bit clock I know off is the tsc, and it's got the highest rating of all clocks. Ah ok, I see that now. Maybe CLOCKSOURCE_OVER_32BITS would be a better name? It might convey the negativity better... -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Using sched_clock for mmio-trace
I've built a tool with the goal of logging mmio writes and reads by device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. I'd like to add support for recording a time stamp on each read and write. Unfortunately, I am not sure which clock api I should use. I had a look at blktrace and saw that it uses 'sched_clock()' for time stamps. However, this symbol is not exported to modules, and from what I've read it sounds like its use is discouraged. The question is, what api should I be using? I need something that can be called from inside interrupt handlers, and obviously the more accurate and the lower the overhead the better. -Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Using sched_clock for mmio-trace
I've built a tool with the goal of logging mmio writes and reads by device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace. I'd like to add support for recording a time stamp on each read and write. Unfortunately, I am not sure which clock api I should use. I had a look at blktrace and saw that it uses 'sched_clock()' for time stamps. However, this symbol is not exported to modules, and from what I've read it sounds like its use is discouraged. The question is, what api should I be using? I need something that can be called from inside interrupt handlers, and obviously the more accurate and the lower the overhead the better. -Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/