Re: [PATCH] Add the latency-collector to tools
On Mon, 15 Feb 2021 11:54:08 + wrote: > Just a quick update: I was able to reproduce this weird behavior that I wrote > about but it turned out to be a problem with my testing. > > The test was faulty because of a mistake that I had made. > > Sorry for the extra noise. No problem. Thanks for letting me know. -- Steve
Re: [PATCH] Add the latency-collector to tools
On Fri, 2021-02-12 at 13:16 +0100, Viktor Rosendahl wrote: > > > However, for some reason I cannot reproduce the behavior now, allthough I use > exactly the same kernel. > > Because humans are more often at fault than computers, I cannot deny the > possibility that I would have misconfigured something and it was all the > result > of a faulty test. > Just a quick update: I was able to reproduce this weird behavior that I wrote about but it turned out to be a problem with my testing. The test was faulty because of a mistake that I had made. Sorry for the extra noise. best regards, Viktor > I will let you know if come up with a way of reproducing this behavior later. > I > cannot spend more time on it right now. > > >
Re: [PATCH] Add the latency-collector to tools
On Thu, 2021-02-11 at 14:56 -0500, Steven Rostedt wrote: > On Thu, 11 Feb 2021 17:17:42 +0100 > Viktor Rosendahl wrote: > > > It seems to work but I discovered during testing that it seems like newer > > kernels have a tendency to lose some latencies in longer bursts. I guess > > this > > is likely to be another regression in the preemptirqsoff tracer. > > Not sure what you mean by the above. I'd be interested in fixing it if is > really a problem. Sorry, my bad; I should have been more specific. Yesterday, when I tested the latency-collector, I could see that there were clear signs of missing latencies. I used preemptirq_delay_test to generate bursts of 10 and never got any stack traces from the 10th latency (nor the 9th or 8th). I think I used 2000us as a threshold and generated latencies of 3000us. Also, sometimes, I could see that there were not enough (meaning 9) messages like "printout skipped due to random delay" before a "randomly sleep for 1000 ms before print", and then the stack trace would point to a latency from the beginning of the burst, and not the last, as would be expected if the latency- collector slept for 1000 ms. I could see the problem also without the latency-collector, because if I generated a burst of 10 and then checked /sys/kernel/tracing/trace, I would not see the the stack trace from the 10th latency but from one of the earlier ones, that is, I would for example see the preemptirqtest_2 function in the trace, instead of preemptirqtest_9, as expected. IIRC, I sometimes saw the reemptirqtest_0 function, indicating that only the first was captured and the rest were lost. However, for some reason I cannot reproduce the behavior now, allthough I use exactly the same kernel. Because humans are more often at fault than computers, I cannot deny the possibility that I would have misconfigured something and it was all the result of a faulty test. I will let you know if come up with a way of reproducing this behavior later. I cannot spend more time on it right now. > > > > I didn't look too deeply at the rest, just skimmed it basically, and I > tried it out. > > I'm fine with pulling this into my queue, as it's just a tool and wont > cause any other issues. I can move some of the files in scripts that deals > with tracing into the tools/tracing directory too. Maybe this should be > placed in a sub directory? tools/tracing/latency/ ? > > Feel free to submit a proper patch (proper change log, etc). > Ok, thanks, I think that I have incorporated all of your suggestions in the patch that I already sent out earlier today. best regards, Viktor
Re: [PATCH] Add the latency-collector to tools
On Thu, 11 Feb 2021 17:17:42 +0100 Viktor Rosendahl wrote: > It seems to work but I discovered during testing that it seems like newer > kernels have a tendency to lose some latencies in longer bursts. I guess this > is likely to be another regression in the preemptirqsoff tracer. Not sure what you mean by the above. I'd be interested in fixing it if is really a problem. > diff --git a/tools/tracing/Makefile b/tools/tracing/Makefile > new file mode 100644 > index ..f53859765154 > --- /dev/null > +++ b/tools/tracing/Makefile > @@ -0,0 +1,23 @@ > +# SPDX-License-Identifier: GPL-2.0 > +# Makefile for vm tools > +# > +VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null) > +VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null) > + > +TARGETS = latency-collector > +CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS) > +LDFLAGS = -lpthread $(VAR_LDLIBS) > + > +all: $(TARGETS) > + > +%: %.c > + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) > + > +clean: > + $(RM) latency-collector > + > +sbindir ?= /usr/sbin I wonder if it should default to /usr/local? prefix ?= /usr/local sbindir ?= ${prefix}/sbin > + > +install: all > + install -d $(DESTDIR)$(sbindir) > + install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir) > diff --git a/tools/tracing/latency-collector.c > b/tools/tracing/latency-collector.c > new file mode 100644 > index ..e9aa7a47a8a3 > --- /dev/null > +++ b/tools/tracing/latency-collector.c > @@ -0,0 +1,2102 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2017, 2018, 2019, 2021 BMW Car IT GmbH > + * Author: Viktor Rosendahl (viktor.rosend...@bmw.de) > + */ > + > +#define _GNU_SOURCE > +#define _POSIX_C_SOURCE 200809L > + > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +static const char *prg_name; > +static const char *prg_unknown = "unknown program name"; > + > +static int fd_stdout; > + > +static int sched_policy; > +static bool sched_policy_set; > + > +static int sched_pri; > +static bool sched_pri_set; > + > +static bool trace_enable = true; > +static bool setup_ftrace = true; > +static bool use_random_sleep; > + > +enum traceopt { > + OPTIDX_FUNC_TR = 0, > + OPTIDX_DISP_GR, > + OPTIDX_NR > +}; > + > +/* Should be in the order of enum traceopt */ > +static const char * const optstr[] = { > + "function-trace", /* OPTIDX_FUNC_TR */ > + "display-graph", /* OPTIDX_DISP_GR */ > + NULL > +}; A trick to force the enum and string order is this: #define TRACE_OPTS \ C(FUNC_TR, "function-trace"), \ C(DISP_GR, "display-graph") #undef C #define C(a, b) OPTIDX_##a enum traceopt { TRACE_OPTS, OPTIDX_NR }; #undef C #define C(a, b) b static const char *cost optstr[] = { TRACE_OPTS, NULL }; > + > +enum errhandling { > + ERR_EXIT = 0, > + ERR_WARN, > + ERR_CLEANUP, > +}; I didn't look too deeply at the rest, just skimmed it basically, and I tried it out. I'm fine with pulling this into my queue, as it's just a tool and wont cause any other issues. I can move some of the files in scripts that deals with tracing into the tools/tracing directory too. Maybe this should be placed in a sub directory? tools/tracing/latency/ ? Feel free to submit a proper patch (proper change log, etc). Thanks, -- Steve