Re: [PATCH] Add the latency-collector to tools

2021-02-15 Thread Steven Rostedt
On Mon, 15 Feb 2021 11:54:08 +
 wrote:

> Just a quick update: I was able to reproduce this weird behavior that I wrote
> about but it turned out to be a problem with my testing.
> 
> The test was faulty because of a mistake that I had made.
> 
> Sorry for the extra noise.

No problem. Thanks for letting me know.

-- Steve


Re: [PATCH] Add the latency-collector to tools

2021-02-15 Thread Viktor.Rosendahl
On Fri, 2021-02-12 at 13:16 +0100, Viktor Rosendahl wrote:
> 
> 
> However, for some reason I cannot reproduce the behavior now, allthough I use
> exactly the same kernel.
> 
> Because humans are more often at fault than computers, I cannot deny the
> possibility that I would have misconfigured something and it was all the
> result
> of a faulty test.
> 

Just a quick update: I was able to reproduce this weird behavior that I wrote
about but it turned out to be a problem with my testing.

The test was faulty because of a mistake that I had made.

Sorry for the extra noise.

best regards,

Viktor


> I will let you know if come up with a way of reproducing this behavior later.
> I
> cannot spend more time on it right now.
> 
> > 



Re: [PATCH] Add the latency-collector to tools

2021-02-12 Thread Viktor.Rosendahl
On Thu, 2021-02-11 at 14:56 -0500, Steven Rostedt wrote:
> On Thu, 11 Feb 2021 17:17:42 +0100
> Viktor Rosendahl  wrote:
> 
> > It seems to work but I discovered during testing that it seems like newer
> > kernels have a tendency to lose some latencies in longer bursts. I guess
> > this
> > is likely to be another regression in the preemptirqsoff tracer.
> 
> Not sure what you mean by the above. I'd be interested in fixing it if is
> really a problem.


Sorry, my bad; I should have been more specific.

Yesterday, when I tested the latency-collector, I could see that there were
clear signs of missing latencies. I used preemptirq_delay_test to generate
bursts of 10 and never got any stack traces from the 10th latency (nor the 9th
or 8th). 

I think I used 2000us as a threshold and generated latencies of 3000us.

Also, sometimes, I could see that there were not enough (meaning 9) messages
like "printout skipped due to random delay" before a "randomly sleep for 1000 ms
before print", and then the stack trace would point to a latency from the
beginning of the burst, and not the last, as would be expected if the latency-
collector slept for 1000 ms.

I could see the problem also without the latency-collector, because if I
generated a burst of 10 and then checked /sys/kernel/tracing/trace, I would not
see the the stack trace from the 10th latency but from one of the earlier ones,
that is, I would for example see the preemptirqtest_2 function in the trace,
instead of preemptirqtest_9, as expected. IIRC, I sometimes saw
the reemptirqtest_0 function, indicating that only the first was captured and
the rest were lost.

However, for some reason I cannot reproduce the behavior now, allthough I use
exactly the same kernel.

Because humans are more often at fault than computers, I cannot deny the
possibility that I would have misconfigured something and it was all the result
of a faulty test.

I will let you know if come up with a way of reproducing this behavior later. I
cannot spend more time on it right now.

> 
> 
> 
> I didn't look too deeply at the rest, just skimmed it basically, and I
> tried it out.
> 
> I'm fine with pulling this into my queue, as it's just a tool and wont
> cause any other issues. I can move some of the files in scripts that deals
> with tracing into the tools/tracing directory too. Maybe this should be
> placed in a sub directory? tools/tracing/latency/ ?
> 
> Feel free to submit a proper patch (proper change log, etc).
> 

Ok, thanks, I think that I have incorporated all of your suggestions in the
patch that I already sent out earlier today.

best regards,

Viktor



Re: [PATCH] Add the latency-collector to tools

2021-02-11 Thread Steven Rostedt
On Thu, 11 Feb 2021 17:17:42 +0100
Viktor Rosendahl  wrote:

> It seems to work but I discovered during testing that it seems like newer
> kernels have a tendency to lose some latencies in longer bursts. I guess this
> is likely to be another regression in the preemptirqsoff tracer.

Not sure what you mean by the above. I'd be interested in fixing it if is
really a problem.

> diff --git a/tools/tracing/Makefile b/tools/tracing/Makefile
> new file mode 100644
> index ..f53859765154
> --- /dev/null
> +++ b/tools/tracing/Makefile
> @@ -0,0 +1,23 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# Makefile for vm tools
> +#
> +VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null)
> +VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null)
> +
> +TARGETS = latency-collector
> +CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS)
> +LDFLAGS = -lpthread $(VAR_LDLIBS)
> +
> +all: $(TARGETS)
> +
> +%: %.c
> + $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
> +
> +clean:
> + $(RM) latency-collector
> +
> +sbindir ?= /usr/sbin

I wonder if it should default to /usr/local?

 prefix ?= /usr/local
 sbindir ?= ${prefix}/sbin

> +
> +install: all
> + install -d $(DESTDIR)$(sbindir)
> + install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir)
> diff --git a/tools/tracing/latency-collector.c 
> b/tools/tracing/latency-collector.c
> new file mode 100644
> index ..e9aa7a47a8a3
> --- /dev/null
> +++ b/tools/tracing/latency-collector.c
> @@ -0,0 +1,2102 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2017, 2018, 2019, 2021 BMW Car IT GmbH
> + * Author: Viktor Rosendahl (viktor.rosend...@bmw.de)
> + */
> +
> +#define _GNU_SOURCE
> +#define _POSIX_C_SOURCE 200809L
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static const char *prg_name;
> +static const char *prg_unknown = "unknown program name";
> +
> +static int fd_stdout;
> +
> +static int sched_policy;
> +static bool sched_policy_set;
> +
> +static int sched_pri;
> +static bool sched_pri_set;
> +
> +static bool trace_enable = true;
> +static bool setup_ftrace = true;
> +static bool use_random_sleep;
> +
> +enum traceopt {
> + OPTIDX_FUNC_TR = 0,
> + OPTIDX_DISP_GR,
> + OPTIDX_NR
> +};
> +
> +/* Should be in the order of enum traceopt */
> +static  const char * const optstr[] = {
> + "function-trace", /* OPTIDX_FUNC_TR  */
> + "display-graph",  /* OPTIDX_DISP_GR  */
> + NULL
> +};

A trick to force the enum and string order is this:


#define TRACE_OPTS  \
  C(FUNC_TR, "function-trace"), \
  C(DISP_GR, "display-graph")

#undef C
#define C(a, b) OPTIDX_##a

enum traceopt {
TRACE_OPTS,
OPTIDX_NR
};

#undef C
#define C(a, b)  b

static const char *cost optstr[] = {
TRACE_OPTS,
NULL
};


> +
> +enum errhandling {
> + ERR_EXIT = 0,
> + ERR_WARN,
> + ERR_CLEANUP,
> +};

I didn't look too deeply at the rest, just skimmed it basically, and I
tried it out.

I'm fine with pulling this into my queue, as it's just a tool and wont
cause any other issues. I can move some of the files in scripts that deals
with tracing into the tools/tracing directory too. Maybe this should be
placed in a sub directory? tools/tracing/latency/ ?

Feel free to submit a proper patch (proper change log, etc).

Thanks,

-- Steve