Re: Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
(2014/11/05 0:51), Pawel Moll wrote: > On Tue, 2014-11-04 at 09:24 +, Masami Hiramatsu wrote: >> What I'd like to do is the binary version of ftrace-marker, the text >> version is already supported by qemu (see below). >> https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg00505.html >> >> But since that is just a string data (not structured data), it is hard to >> analyze via perf-script or some other useful filters/triggers in ftrace. >> >> In my idea, the new event will be defined via a special file in debugfs like >> kprobe-events, like below. >> >> # cd $debugfs/tracing >> # echo "newgrp/newevent signarg:s32 flag:u64" >> marker_events >> # cat events/newgrp/newevent/format >> name: newevent >> ID: 2048 >> format: >> field:unsigned short common_type; offset:0; size:2; >> signed:0; >> field:unsigned char common_flags; offset:2; size:1; >> signed:0; >> field:unsigned char common_preempt_count; offset:3; >> size:1;signed:0; >> field:int common_pid; offset:4; size:4; signed:1; >> >> field:s32 signarg; offset:8; size:4; signed:1; >> field:u64 flag; offset:12; size:8; signed:0; >> >> print fmt: "signarg=%d flag=0x%Lx", REC->signarg, REC->flag >> >> Then, users will write the data (excluded common fields) when the event >> happens >> via trace_marker which start with '\0'ID(in u32). Kernel just checks the ID >> and >> its data size, but doesn't parse, filter/trigger it and log it into the >> kernel buffer. > > Very neat, I like it! Certainly useful with scripting. Any gut feeling > regarding the kernel version it will be ready for? 3.19 or later than > that? Thanks, and not yet implemented, I'd like to ask people about the format etc. before that :) >> Of course, this has a downside that the user must have a privilege to access >> to debugfs. >> Thus maybe we need both of prctl() IF for perf and this IF for ftrace. > > I don't have any particularly strong feelings about the solution as long > as I'm able to create this "synchronisation point" of mine in the perf > data. In one of this patch's previous incarnations I was also doing a > write() to the perf fd to achieve pretty much the same result. > > In my personal use case root access to debugfs isn't a problem (I need > it for other ftrace operations anyway). However Ingo and some other guys > seemed interested in prctl() approach because: 1. it's much simpler to > use even comparing with simple trace_marker's open(path)/write()/close() > and 2. because any process can do it at any time and the results are > quietly discarded if no one is listening. I also remember that when I > proposed sort of "unification" between trace_marker and the uevents, > Ingo straight away "suggested" keeping it separate. Agreed, I think we can keep trace_marker opened (so application will just need to write() the events), but for the second reason, prctl will be better for per-application usage. Actually, ftrace is "system-wide" oriented, but the perf is not. Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tuesday 04 November 2014 11:49:04 Thomas Gleixner wrote: > On Tue, 4 Nov 2014, Richard Cochran wrote: > > > On Tue, Nov 04, 2014 at 09:01:31AM +0100, Arnd Bergmann wrote: > > > On Monday 03 November 2014 17:11:53 John Stultz wrote: > > > > I've got some thoughts on what a possible interface that wouldn't be > > > > awful could look like, but I'm still hesitant because I don't really > > > > know if exposing this sort of data is actually a good idea long term. > > > > > > I was also thinking (while working on an unrelated patch) we could use > > > a system call like > > > > > > int clock_getoffset(clockid_t clkid, struct timespec *offs); > > We might make *offs a timespec64 or u64 I don't think we are ready yet to introduce timespec64 in the uapi headers, this needs some more careful planning. Otherwise I agree it's bad to introduce syscalls that we already know will become obsolete soon. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tue, Nov 4, 2014 at 12:01 AM, Arnd Bergmann wrote: > On Monday 03 November 2014 17:11:53 John Stultz wrote: >> I've got some thoughts on what a possible interface that wouldn't be >> awful could look like, but I'm still hesitant because I don't really >> know if exposing this sort of data is actually a good idea long term. > > I was also thinking (while working on an unrelated patch) we could use > a system call like > > int clock_getoffset(clockid_t clkid, struct timespec *offs); > > that returns the current offset between CLOCK_REALTIME and the > requested timebase. It is of course racy, but so is every use > of CLOCK_REALTIME. We could also use a reference other than > CLOCK_REALTIME that might be more stable, but passing two arbitrary > clocks as input would make this much more complex to implement. Yea, this is too racy for me, at least for it to be useful. You get an offset, but you don't get any sense of what it was actually valid for. I think to be at all useful, you'll have to return both a timestamp for a given clockid, and an offset to the second clockid. That way you can generate a valid point in time on two clocks (as best as possible, given possible non-atomic reads of separately backed clockids). But again, I'm not totally sure exposing this provides that much value over userspace reading the two clocks itself (in ABA fashion) to sort this out. And I also don't see it as particularly related to this perf extension that Pawel is doing (since we are trying to avoid making the perf clock a directly accessible clockid). thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tue, 2014-11-04 at 09:24 +, Masami Hiramatsu wrote: > What I'd like to do is the binary version of ftrace-marker, the text > version is already supported by qemu (see below). > https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg00505.html > > But since that is just a string data (not structured data), it is hard to > analyze via perf-script or some other useful filters/triggers in ftrace. > > In my idea, the new event will be defined via a special file in debugfs like > kprobe-events, like below. > > # cd $debugfs/tracing > # echo "newgrp/newevent signarg:s32 flag:u64" >> marker_events > # cat events/newgrp/newevent/format > name: newevent > ID: 2048 > format: > field:unsigned short common_type; offset:0; size:2; > signed:0; > field:unsigned char common_flags; offset:2; size:1; > signed:0; > field:unsigned char common_preempt_count; offset:3; > size:1;signed:0; > field:int common_pid; offset:4; size:4; signed:1; > > field:s32 signarg; offset:8; size:4; signed:1; > field:u64 flag; offset:12; size:8; signed:0; > > print fmt: "signarg=%d flag=0x%Lx", REC->signarg, REC->flag > > Then, users will write the data (excluded common fields) when the event > happens > via trace_marker which start with '\0'ID(in u32). Kernel just checks the ID > and > its data size, but doesn't parse, filter/trigger it and log it into the > kernel buffer. Very neat, I like it! Certainly useful with scripting. Any gut feeling regarding the kernel version it will be ready for? 3.19 or later than that? > Of course, this has a downside that the user must have a privilege to access > to debugfs. > Thus maybe we need both of prctl() IF for perf and this IF for ftrace. I don't have any particularly strong feelings about the solution as long as I'm able to create this "synchronisation point" of mine in the perf data. In one of this patch's previous incarnations I was also doing a write() to the perf fd to achieve pretty much the same result. In my personal use case root access to debugfs isn't a problem (I need it for other ftrace operations anyway). However Ingo and some other guys seemed interested in prctl() approach because: 1. it's much simpler to use even comparing with simple trace_marker's open(path)/write()/close() and 2. because any process can do it at any time and the results are quietly discarded if no one is listening. I also remember that when I proposed sort of "unification" between trace_marker and the uevents, Ingo straight away "suggested" keeping it separate. Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tue, 2014-11-04 at 01:25 +, Andy Lutomirski wrote: > >> If you're going to add double-stamped packets, can you also add a > >> syscall to read multiple clocks at once, atomically? Or can you > >> otherwise add a non-perf mechanism to get at this data? > > > > I've got some thoughts on what a possible interface that wouldn't be > > awful could look like, but I'm still hesitant because I don't really > > know if exposing this sort of data is actually a good idea long term. > > My only real thought here is that, if perf is going to try to do this, > then presumably it should be reasonably integrated w/ the core timing > code. I.e. if perf does this, then presumably the core code should > know about it and there should be a core interface to it. I think I understand where you're coming from. Arnd's idea for the API seems reasonable, although I can't promise implementing a proposal (don't make me stop you from doing it :-). As to the perf-specific correlation, I'm assuming limited accuracy. Others already mentioned that in the absence of hardware support, the time values are never really "atomic". The best what can be done is to access them as near to each other in the code as possible and make sure it happens in a non-preemptible section. In my tests I've achieved, on average, sub-microsecond accuracy, which was good enough from my perspective, but it's far from ideal 42ns resolution for my (just an example) time source clocked at 24MHz. Paweł -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tue, 4 Nov 2014, Richard Cochran wrote: > On Tue, Nov 04, 2014 at 09:01:31AM +0100, Arnd Bergmann wrote: > > On Monday 03 November 2014 17:11:53 John Stultz wrote: > > > I've got some thoughts on what a possible interface that wouldn't be > > > awful could look like, but I'm still hesitant because I don't really > > > know if exposing this sort of data is actually a good idea long term. > > > > I was also thinking (while working on an unrelated patch) we could use > > a system call like > > > > int clock_getoffset(clockid_t clkid, struct timespec *offs); We might make *offs a timespec64 or u64 :) > > that returns the current offset between CLOCK_REALTIME and the > > requested timebase. It is of course racy, but so is every use > > of CLOCK_REALTIME. We could also use a reference other than > > CLOCK_REALTIME that might be more stable, but passing two arbitrary > > clocks as input would make this much more complex to implement. > > No, it is really easy to implement. Just drop the idea of "atomic". It > really is not necessary or even possible. If the two clocks have the same underlying hardware then you get an 'atomic' snapshot of their relationship. That's true for any combination of CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME and CLOCK_TAI. So we can and should expose these 'atomic' snapshots. There is another reason why we want to support the notion of 'atomic' snapshots: There exists hardware which gives you 'atomic' samples of two different hardware clocks and there is more of that coming soon. If that's not the case then you need two seperate readouts which of course cannot provide any guarantee, but I agree that we could do something like what you do in the PTP_SYS_OFFSET ioctl and let user space analyze the samples. But that should be optional. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
Hello, (2014/11/04 9:28), Pawel Moll wrote: > 2. User event generation > > Everyone present agreed that it would be a very-nice-to-have feature. > There was some discussion about implementation details, so I welcome > feedback and comments regarding my take on the matter. Hmm, I'm trying to make a similar thing, dynamic event definition via ftrace, which is already done by kprobes/uprobes. And this will be shown as dynamic events from perf too. What I'd like to do is the binary version of ftrace-marker, the text version is already supported by qemu (see below). https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg00505.html But since that is just a string data (not structured data), it is hard to analyze via perf-script or some other useful filters/triggers in ftrace. In my idea, the new event will be defined via a special file in debugfs like kprobe-events, like below. # cd $debugfs/tracing # echo "newgrp/newevent signarg:s32 flag:u64" >> marker_events # cat events/newgrp/newevent/format name: newevent ID: 2048 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1;signed:0; field:int common_pid; offset:4; size:4; signed:1; field:s32 signarg; offset:8; size:4; signed:1; field:u64 flag; offset:12; size:8; signed:0; print fmt: "signarg=%d flag=0x%Lx", REC->signarg, REC->flag Then, users will write the data (excluded common fields) when the event happens via trace_marker which start with '\0'ID(in u32). Kernel just checks the ID and its data size, but doesn't parse, filter/trigger it and log it into the kernel buffer. Of course, this has a downside that the user must have a privilege to access to debugfs. Thus maybe we need both of prctl() IF for perf and this IF for ftrace. Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Tue, Nov 04, 2014 at 09:01:31AM +0100, Arnd Bergmann wrote: > On Monday 03 November 2014 17:11:53 John Stultz wrote: > > I've got some thoughts on what a possible interface that wouldn't be > > awful could look like, but I'm still hesitant because I don't really > > know if exposing this sort of data is actually a good idea long term. > > I was also thinking (while working on an unrelated patch) we could use > a system call like > > int clock_getoffset(clockid_t clkid, struct timespec *offs); > > that returns the current offset between CLOCK_REALTIME and the > requested timebase. It is of course racy, but so is every use > of CLOCK_REALTIME. We could also use a reference other than > CLOCK_REALTIME that might be more stable, but passing two arbitrary > clocks as input would make this much more complex to implement. No, it is really easy to implement. Just drop the idea of "atomic". It really is not necessary or even possible. Thanks, Richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Mon, Nov 03, 2014 at 05:11:53PM -0800, John Stultz wrote: > On Mon, Nov 3, 2014 at 4:58 PM, Andy Lutomirski wrote: > > If you're going to add double-stamped packets, can you also add a > > syscall to read multiple clocks at once, atomically? Or can you > > otherwise add a non-perf mechanism to get at this data? Does not need to be "atomic". In fact it cannot be atomic in the general case. Some clocks are read over memory mapped registers, but others are read over slow and sleepy buses like PCIe or MDIO. > > Because the realtime to monotonic offset is really quite useful for > > things like this, and it seems silly to make people actually open a > > perf_event to get at it. > > So this comes up periodically, but I don't think I've seen a interface > proposal that was decent yet. We have ioctl PTP_SYS_OFFSET that alternately reads a dynamic clock and CLOCK_REALTIME a given number of times. This is done without locks or any kind of "atomic" guarantees, and it works quite well in practice. The user can pick the number of repetitions to deal with noisy run time environments, and usually it is a simple matter of picking the reading with the shortest duration. However, the user is free to do statistics over the readings in any way he wants. It would be nice (and people have requested) a syscall that takes two clockid_t arguments but otherwise works like PTP_SYS_OFFSET. We really will never have to support more than two clocks. The application will pick one clock as the reference and then measure each of the other clocks relative to it, one at a time. The performance should be perfectly adequate, even better than reading three or more at once (with the understanding that these are "software" time stamps). Thanks, Richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Monday 03 November 2014 17:11:53 John Stultz wrote: > I've got some thoughts on what a possible interface that wouldn't be > awful could look like, but I'm still hesitant because I don't really > know if exposing this sort of data is actually a good idea long term. I was also thinking (while working on an unrelated patch) we could use a system call like int clock_getoffset(clockid_t clkid, struct timespec *offs); that returns the current offset between CLOCK_REALTIME and the requested timebase. It is of course racy, but so is every use of CLOCK_REALTIME. We could also use a reference other than CLOCK_REALTIME that might be more stable, but passing two arbitrary clocks as input would make this much more complex to implement. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Mon, Nov 3, 2014 at 5:11 PM, John Stultz wrote: > On Mon, Nov 3, 2014 at 4:58 PM, Andy Lutomirski wrote: >> On Mon, Nov 3, 2014 at 4:28 PM, Pawel Moll wrote: >>> From: Pawel Moll >>> Thomas suggested solution which gets down to my original proposal for >>> sched/monotonic clock correlation - an additional sample type so events >>> can be "double stamped" using different clock sources providing >>> synchronisation points for later time approximation. I've just extended >>> the implementation with configuration value to select the clock source. >>> If the first patch (making perf timestamps monotonic) gets accepted, >>> there will be no immediate need for this one, but I'd like to gain some >>> feedback anyway. >>> >> >> I have nothing intelligent to add to the potentional Thomas/Ingo >> showdown, but I do have a related thought. :) >> >> If you're going to add double-stamped packets, can you also add a >> syscall to read multiple clocks at once, atomically? Or can you >> otherwise add a non-perf mechanism to get at this data? >> >> Because the realtime to monotonic offset is really quite useful for >> things like this, and it seems silly to make people actually open a >> perf_event to get at it. > > So this comes up periodically, but I don't think I've seen a interface > proposal that was decent yet. > > Also, if you want to read multiple clocks at once, do you stop at two, > or three, or... there's possibly quite a few. Additionally some > clocks may not be possible to read atomically (perf/sched clock and > system time for example may be based on different underlying > clocksources). The general idea feels like its creeping towards some > "atomically expose all timekeeping state" mega-interface. > > I've got some thoughts on what a possible interface that wouldn't be > awful could look like, but I'm still hesitant because I don't really > know if exposing this sort of data is actually a good idea long term. My only real thought here is that, if perf is going to try to do this, then presumably it should be reasonably integrated w/ the core timing code. I.e. if perf does this, then presumably the core code should know about it and there should be a core interface to it. --Andy > > thanks > -john -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Mon, Nov 3, 2014 at 4:58 PM, Andy Lutomirski wrote: > On Mon, Nov 3, 2014 at 4:28 PM, Pawel Moll wrote: >> From: Pawel Moll >> Thomas suggested solution which gets down to my original proposal for >> sched/monotonic clock correlation - an additional sample type so events >> can be "double stamped" using different clock sources providing >> synchronisation points for later time approximation. I've just extended >> the implementation with configuration value to select the clock source. >> If the first patch (making perf timestamps monotonic) gets accepted, >> there will be no immediate need for this one, but I'd like to gain some >> feedback anyway. >> > > I have nothing intelligent to add to the potentional Thomas/Ingo > showdown, but I do have a related thought. :) > > If you're going to add double-stamped packets, can you also add a > syscall to read multiple clocks at once, atomically? Or can you > otherwise add a non-perf mechanism to get at this data? > > Because the realtime to monotonic offset is really quite useful for > things like this, and it seems silly to make people actually open a > perf_event to get at it. So this comes up periodically, but I don't think I've seen a interface proposal that was decent yet. Also, if you want to read multiple clocks at once, do you stop at two, or three, or... there's possibly quite a few. Additionally some clocks may not be possible to read atomically (perf/sched clock and system time for example may be based on different underlying clocksources). The general idea feels like its creeping towards some "atomically expose all timekeeping state" mega-interface. I've got some thoughts on what a possible interface that wouldn't be awful could look like, but I'm still hesitant because I don't really know if exposing this sort of data is actually a good idea long term. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 0/3] perf: User/kernel time correlation and event generation
On Mon, Nov 3, 2014 at 4:28 PM, Pawel Moll wrote: > From: Pawel Moll > Thomas suggested solution which gets down to my original proposal for > sched/monotonic clock correlation - an additional sample type so events > can be "double stamped" using different clock sources providing > synchronisation points for later time approximation. I've just extended > the implementation with configuration value to select the clock source. > If the first patch (making perf timestamps monotonic) gets accepted, > there will be no immediate need for this one, but I'd like to gain some > feedback anyway. > I have nothing intelligent to add to the potentional Thomas/Ingo showdown, but I do have a related thought. :) If you're going to add double-stamped packets, can you also add a syscall to read multiple clocks at once, atomically? Or can you otherwise add a non-perf mechanism to get at this data? Because the realtime to monotonic offset is really quite useful for things like this, and it seems silly to make people actually open a perf_event to get at it. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/