from:"Daniel Bristot de Oliveira"

Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-05 Thread Daniel Bristot de Oliveira

On 6/5/24 15:24, Qais Yousef wrote:
>>> But rt is a shortened version of realtime, and so it is making *it less*
>>> clear that we also have DL here.
>> Can SCHED_DL be considered a real-time scheduling class as in opposite
>> to SCHED_BATCH for instance? Due to its requirements it fits for a real
>> time scheduling class, right?
>> And RT (as in real time) already includes SCHED_RR and SCHED_FIFO.
> Yeah I think the usage of realtime to cover both makes sense. I followed your
> precedence with task_is_realtime().
> 
> Anyway. If people really find this confusing, what would make sense is to 
> split
> them and ask users to call rt_task() and dl_task() explicitly without this
> wrapper. I personally like it better with the wrapper. But happy to follow the
> crowd.

For me, doing dl_ things it is better to keep them separate, so I can
easily search for dl_ specific checks.

rt_or_dl_task(p);

would also make it clear that we have both.

-- Daniel

Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-05 Thread Daniel Bristot de Oliveira

On 6/5/24 11:32, Sebastian Andrzej Siewior wrote:
> On 2024-06-04 17:57:46 [+0200], Daniel Bristot de Oliveira wrote:
>> On 6/4/24 16:42, Qais Yousef wrote:
>>> -   (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
>>> +   (wakeup_rt && !realtime_task(p)) ||
>>
>> I do not like bikeshedding, and no hard feelings...
>>
>> But rt is a shortened version of realtime, and so it is making *it less*
>> clear that we also have DL here.
> 
> Can SCHED_DL be considered a real-time scheduling class as in opposite
> to SCHED_BATCH for instance? Due to its requirements it fits for a real
> time scheduling class, right?
> And RT (as in real time) already includes SCHED_RR and SCHED_FIFO.

It is a real-time scheduler, but the problem is that FIFO and RR are in rt.c and
they are called the "realtime" ones, so they are the first to come in mind.

-- Daniel

>> -- Daniel
> 
> Sebastian
>

Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-04 Thread Daniel Bristot de Oliveira

On 6/4/24 18:37, Steven Rostedt wrote:
> On Tue, 4 Jun 2024 17:57:46 +0200
> Daniel Bristot de Oliveira  wrote:
> 
>> On 6/4/24 16:42, Qais Yousef wrote:
>>> -   (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
>>> +   (wakeup_rt && !realtime_task(p)) ||  
>>
>> I do not like bikeshedding, and no hard feelings...
>>
>> But rt is a shortened version of realtime, and so it is making *it less*
>> clear that we also have DL here.
>>
>> I know we can always read the comments, but we can do without changes
>> as well...
>>
>> I would suggest finding the plural version for realtime_task()... so
>> we know it is not about the "rt" scheduler, but rt and dl schedulers.
> 
> priority_task() ?

rt_or_dl_task() ?
rt_schedulers_task() ?
higher_than_fair_task() ? (this is bad haha)

I am not good with names, and it is hard to find one, I know but something
to make it clear that dl is also there becase rt/realtime is ambiguous with
the rt.c.

-- Daniel

Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()

2024-06-04 Thread Daniel Bristot de Oliveira

On 6/4/24 16:42, Qais Yousef wrote:
> - (wakeup_rt && !dl_task(p) && !rt_task(p)) ||
> + (wakeup_rt && !realtime_task(p)) ||

I do not like bikeshedding, and no hard feelings...

But rt is a shortened version of realtime, and so it is making *it less*
clear that we also have DL here.

I know we can always read the comments, but we can do without changes
as well...

I would suggest finding the plural version for realtime_task()... so
we know it is not about the "rt" scheduler, but rt and dl schedulers.

-- Daniel

Re: [PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-28 Thread Daniel Bristot de Oliveira

On 5/20/24 07:42, Yang Li wrote:
> The patch updates the function documentation comment for
> rv_en(dis)able_monitor to adhere to the kernel-doc specification.
> 
> Signed-off-by: Yang Li 

Acked-by: Daniel Bristot de Oliveira 

Thanks
-- Daniel

Re: [PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-17 Thread Daniel Bristot de Oliveira

Hi Yang

On 5/17/24 11:14, Yang Li wrote:
> The patch updates the function documentation comment for
> rv_en(dis)able_monitor to adhere to the kernel-doc specification.
> 
> Signed-off-by: Yang Li 
> ---
>  kernel/trace/rv/rv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
> index 2f68e93fff0b..df0745a42a3f 100644
> --- a/kernel/trace/rv/rv.c
> +++ b/kernel/trace/rv/rv.c
> @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def 
> *mdef, bool sync)
>  
>  /**
>   * rv_disable_monitor - disable a given runtime monitor
> + * @mdef: Pointer to the monitor definition structure.

This change is in for mainline kernel, why are you using the -next on the 
Subject?

-- Daniel

[GIT PULL] tracing/tools: Updates for 6.10

2024-05-17 Thread Daniel Bristot de Oliveira

Steven,

RTLA:
Specific for timerlat:

- Improves the output of timerlat top by adding a missing \n,
  and by avoiding printing color-formatting characters where
  they are translated to regular characters.

- Improves timerlat auto-analysis output by replacing '\t'
  with spaces to avoid copy-and-paste issues when reporting
  problems.

- For timerlat, make the user-space (-u) option the default,
  as it is the most complete test. Add a -k option to use
  the in-kernel workload.

- On timerlat top and hist, add a summary with the overall
  results. For instance, the minimum value for all CPUs,
  the overall average and the maximum value from all CPUs.

- timerlat hist was printing initial values (i.e., 0 as max,
  and ~0 as min) if the trace stopped before the first Ret-User
  event. This problem was fixed by printing the "   - "
  no value string to the output if that was the case.

For all RTLA tools:

- Add a --warm-up  option, allowing the workload to
  run for  before starting to collect results.

- Add a --trace-buffer-size option, allowing the user to set
  the tracing buffer size for -t option. This option is mainly
  useful for reducing the trace file. Now rtla depends on
  libtracefs >= 1.6.

- Fix the -t [trace_file] parsing, now it does not require
  the '=' before the option parameter, and better handles the
  multiple ways a user can pass the trace_file.txt


Please pull the latest trace-tools-v6.10 tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git
trace-tools-v6.10

Tag SHA1: dbd633e7f81bac0114f609ff2eade9f4e66a28e2
Head SHA1: 59c22f70b2951d81de410d477ae536ba951b4f37

Daniel Bristot de Oliveira (8):
  rtla/timerlat: Simplify "no value" printing on top
  rtla/auto-analysis: Replace \t with spaces
  rtla/timerlat: Use pretty formatting only on interactive tty
  rtla/timerlat: Add a summary for top mode
  rtla/timerlat: Add a summary for hist mode
  rtla: Add the --warm-up option
  rtla/timerlat: Make user-space threads the default
  rtla: Add --trace-buffer-size option

John Kacur (3):
  rtla/timerlat: Fix histogram report when a cpu count is 0
  rtla: Fix -t\--trace[=file]
  rtla: Documentation: Fix -t, --trace


 Documentation/tools/rtla/common_options.rst|  11 +-
 .../tools/rtla/common_osnoise_options.rst  |   4 +
 .../tools/rtla/common_timerlat_options.rst |  10 +-
 tools/tracing/rtla/Makefile.config |   2 +-
 tools/tracing/rtla/src/osnoise_hist.c  |  55 +++-
 tools/tracing/rtla/src/osnoise_top.c   |  55 +++-
 tools/tracing/rtla/src/timerlat_aa.c   | 109 
 tools/tracing/rtla/src/timerlat_hist.c | 294 ++---
 tools/tracing/rtla/src/timerlat_top.c  | 250 +++---
 tools/tracing/rtla/src/trace.c |  15 ++
 tools/tracing/rtla/src/trace.h |   1 +
 11 files changed, 653 insertions(+), 153 deletions(-)

[GIT PULL V2] tracing/tools: Updates for 6.9

2024-03-20 Thread Daniel Bristot de Oliveira

Steven,

[
   I rebased my queue on top of the v6.8 tag.
]

Tracing:
- Update makefiles for latency-collector and RTLA,
  using tools/build/ makefiles like perf does, inheriting
  its benefits. For example, having a proper way to
  handle dependencies.

- The timerlat tracer has an interface for any tool to use.
  rtla timerlat tool uses this interface dispatching its
  own threads as workload. But, rtla timerlat could also be
  used for any other process. So, add 'rtla timerlat -U'
  option, allowing the timerlat tool to measure the latency of
  any task using the timerlat tracer interface.

Verification:
- Update makefiles for verification/rv, using tools/build/
  makefiles like perf does, inheriting its benefits.
  For example, having a proper way to handle dependencies.


Please pull the latest trace-tools-v6.9-2 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git
trace-tools-v6.9-2

Tag SHA1: e8d5e0f38601c3718874d95db2a0020ab1c454df
Head SHA1: a23c05fd76cf4ad27e0c74f7a93e7b089e94a55c


Daniel Bristot de Oliveira (4):
  tools/tracing: Use tools/build makefiles on latency-collector
  tools/rtla: Use tools/build makefiles to build rtla
  tools/verification: Use tools/build makefiles on rv
  tools/rtla: Add -U/--user-load option to timerlat


 .../tools/rtla/common_timerlat_options.rst |   6 +
 tools/tracing/latency/.gitignore   |   5 +-
 tools/tracing/latency/Build|   1 +
 tools/tracing/latency/Makefile | 105 --
 tools/tracing/latency/Makefile.config  |  30 +++
 tools/tracing/rtla/.gitignore  |   7 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 217 +++--
 tools/tracing/rtla/Makefile.config |  47 +
 tools/tracing/rtla/Makefile.rtla   |  80 
 tools/tracing/rtla/Makefile.standalone |  26 +++
 tools/tracing/rtla/sample/timerlat_load.py |  74 +++
 tools/tracing/rtla/src/Build   |  11 ++
 tools/tracing/rtla/src/timerlat_hist.c |  16 +-
 tools/tracing/rtla/src/timerlat_top.c  |  14 +-
 tools/verification/rv/.gitignore   |   6 +
 tools/verification/rv/Build|   1 +
 tools/verification/rv/Makefile | 207 +++-
 tools/verification/rv/Makefile.config  |  47 +
 tools/verification/rv/Makefile.rv  |  51 +
 tools/verification/rv/src/Build|   4 +
 21 files changed, 650 insertions(+), 306 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/sample/timerlat_load.py
 create mode 100644 tools/tracing/rtla/src/Build
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build
---

Re: [GIT PULL] tracing/tools: Updates for 6.9

2024-03-20 Thread Daniel Bristot de Oliveira

On 3/20/24 00:02, Steven Rostedt wrote:
> On Mon, 18 Mar 2024 18:41:13 +0100
> Daniel Bristot de Oliveira  wrote:
> 
>> Steven,
>>
>> Tracing tooling updates for 6.9
>>
>> Tracing:
>> - Update makefiles for latency-collector and RTLA,
>>   using tools/build/ makefiles like perf does, inheriting
>>   its benefits. For example, having a proper way to
>>   handle dependencies.
>>
>> - The timerlat tracer has an interface for any tool to use.
>>   rtla timerlat tool uses this interface dispatching its
>>   own threads as workload. But, rtla timerlat could also be
>>   used for any other process. So, add 'rtla timerlat -U'
>>   option, allowing the timerlat tool to measure the latency of
>>   any task using the timerlat tracer interface.
>>
>> Verification:
>> - Update makefiles for verification/rv, using tools/build/
>>   makefiles like perf does, inheriting its benefits.
>>   For example, having a proper way to handle dependencies.
>>
>>
>> Please pull the latest trace-tools-v6.9 tree, which can be found at:
>>
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git
>> trace-tools-v6.9
> 
> Looks like you just built on top of a random commit from Linus's tree:

yeah :-/

> commit f6cef5f8c37f58a3bc95b3754c3ae98e086631ca
> Merge: 906a93befec8 8f06fb458539
> Author: Linus Torvalds 
> Date:   Sun Mar 17 16:59:33 2024 -0700
> 
> Merge tag 'i3c/for-6.9' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
> 
> Linus prefers basing off of real tags or previous pulls from us.

Ack, took note. I will do on top v6.8 tag.

> Can you rebase your changes on v6.8 and resend?
> 
>   $ git checkout v6.8
>   $ git cherry-pick f6cef5f8c37f58a3bc95b3754c3ae98e086631ca..trace-tools-v6.9
> 
> Appears to work fine.

questions: when something go wrong in a pull request

- Should I keep the old tag, and then do another one with -2
  (it seems you do like this), or delete the old tag and send it again
  with the same name?

- Should I resend the PULL request with something in the log or
  at the Subject saying it is a v2 of the pull request?

I could ask via chat, but I think it is good for the community to
have access to these info.

Thanks!
-- Daniel

> Thanks!
> 
> -- Steve

[GIT PULL] tracing/tools: Updates for 6.9

2024-03-18 Thread Daniel Bristot de Oliveira

Steven,

Tracing tooling updates for 6.9

Tracing:
- Update makefiles for latency-collector and RTLA,
  using tools/build/ makefiles like perf does, inheriting
  its benefits. For example, having a proper way to
  handle dependencies.

- The timerlat tracer has an interface for any tool to use.
  rtla timerlat tool uses this interface dispatching its
  own threads as workload. But, rtla timerlat could also be
  used for any other process. So, add 'rtla timerlat -U'
  option, allowing the timerlat tool to measure the latency of
  any task using the timerlat tracer interface.

Verification:
- Update makefiles for verification/rv, using tools/build/
  makefiles like perf does, inheriting its benefits.
  For example, having a proper way to handle dependencies.


Please pull the latest trace-tools-v6.9 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git
trace-tools-v6.9

Tag SHA1: 2eb09a97c56af3c27bd9dcebccb495f70d56d5c0
Head SHA1: 9c63d9f58a42b979a42bcaed534d9246996ac0d9


Daniel Bristot de Oliveira (4):
  tools/tracing: Use tools/build makefiles on latency-collector
  tools/rtla: Use tools/build makefiles to build rtla
  tools/verification: Use tools/build makefiles on rv
  tools/rtla: Add -U/--user-load option to timerlat


 .../tools/rtla/common_timerlat_options.rst |   6 +
 tools/tracing/latency/.gitignore   |   5 +-
 tools/tracing/latency/Build|   1 +
 tools/tracing/latency/Makefile | 105 --
 tools/tracing/latency/Makefile.config  |  30 +++
 tools/tracing/rtla/.gitignore  |   7 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 217 +++--
 tools/tracing/rtla/Makefile.config |  47 +
 tools/tracing/rtla/Makefile.rtla   |  80 
 tools/tracing/rtla/Makefile.standalone |  26 +++
 tools/tracing/rtla/sample/timerlat_load.py |  74 +++
 tools/tracing/rtla/src/Build   |  11 ++
 tools/tracing/rtla/src/timerlat_hist.c |  16 +-
 tools/tracing/rtla/src/timerlat_top.c  |  14 +-
 tools/verification/rv/.gitignore   |   6 +
 tools/verification/rv/Build|   1 +
 tools/verification/rv/Makefile | 207 +++-
 tools/verification/rv/Makefile.config  |  47 +
 tools/verification/rv/Makefile.rv  |  51 +
 tools/verification/rv/src/Build|   4 +
 21 files changed, 650 insertions(+), 306 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/sample/timerlat_load.py
 create mode 100644 tools/tracing/rtla/src/Build
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build
---

[PATCH V3 3/3] tools/verification: Use tools/build makefiles on rv

2024-03-15 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build rv, inheriting the benefits of
it. For example, having a proper way to handle dependencies.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/verification/rv/.gitignore  |   6 +
 tools/verification/rv/Build   |   1 +
 tools/verification/rv/Makefile| 207 +-
 tools/verification/rv/Makefile.config |  47 ++
 tools/verification/rv/Makefile.rv |  51 +++
 tools/verification/rv/src/Build   |   4 +
 6 files changed, 183 insertions(+), 133 deletions(-)
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build

diff --git a/tools/verification/rv/.gitignore b/tools/verification/rv/.gitignore
new file mode 100644
index ..34a486585a34
--- /dev/null
+++ b/tools/verification/rv/.gitignore
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+rv
+rv-static
+fixdep
+feature
+FEATURE-DUMP
diff --git a/tools/verification/rv/Build b/tools/verification/rv/Build
new file mode 100644
index ..a44c22349d4b
--- /dev/null
+++ b/tools/verification/rv/Build
@@ -0,0 +1 @@
+rv-y += src/
diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile
index 485f8aeddbe0..411d62b3d8eb 100644
--- a/tools/verification/rv/Makefile
+++ b/tools/verification/rv/Makefile
@@ -1,146 +1,87 @@
-NAME   :=  rv
-# Follow the kernel version
-VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. 
kernelversion | grep -v make)
-
-# From libtracefs:
-# Makefiles suck: This macro sets a default value of $(2) for the
-# variable named by $(1), unless the variable has been set by
-# environment or command line. This is necessary for CC and AR
-# because make sets default values, so the simpler ?= approach
-# won't work as expected.
-define allow-override
-  $(if $(or $(findstring environment,$(origin $(1))),\
-$(findstring command line,$(origin $(1,,\
-$(eval $(1) = $(2)))
-endef
-
-# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
-$(call allow-override,CC,$(CROSS_COMPILE)gcc)
-$(call allow-override,AR,$(CROSS_COMPILE)ar)
-$(call allow-override,STRIP,$(CROSS_COMPILE)strip)
-$(call allow-override,PKG_CONFIG,pkg-config)
-$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
-$(call allow-override,LDCONFIG,ldconfig)
-
-INSTALL=   install
-MKDIR  =   mkdir
-FOPTS  :=  -flto=auto -ffat-lto-objects -fexceptions 
-fstack-protector-strong \
-   -fasynchronous-unwind-tables -fstack-clash-protection
-WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
-
-ifeq ($(CC),clang)
-  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
-  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+# SPDX-License-Identifier: GPL-2.0-only
+
+ifeq ($(srctree),)
+  srctree  := $(patsubst %/,%,$(dir $(CURDIR)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
 endif
 
-TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
-
-CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include
-LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
-LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
-
-SRC:=  $(wildcard src/*.c)
-HDR:=  $(wildcard src/*.h)
-OBJ:=  $(SRC:.c=.o)
-DIRS   :=  src
-FILES  :=  Makefile README.txt
-CEXT   :=  bz2
-TARBALL:=  $(NAME)-$(VERSION).tar.$(CEXT)
-TAROPTS:=  -cvjf $(TARBALL)
-BINDIR :=  /usr/bin
-DATADIR:=  /usr/share
-DOCDIR :=  $(DATADIR)/doc
-MANDIR :=  $(DATADIR)/man
-LICDIR :=  $(DATADIR)/licenses
-SRCTREE:=  $(or $(BUILD_SRC),$(CURDIR))
-
-# If running from the tarball, man pages are stored in the Documentation
-# dir. If running from the kernel source, man pages are stored in
-# Documentation/tools/rv/.
-ifneq ($(wildcard Documentation/.*),)
-DOCSRC =   Documentation/
+include $(srctree)/tools/scripts/Makefile.include
+
+# O is an alias for OUTPUT
+OUTPUT  := $(O)
+
+ifeq ($(OUTPUT),)
+  OUTPUT   := $(CURDIR)/
 else
-DOCSRC =   $(SRCTREE)/../../../Documentation/tools/rv/
+  # subdir is used by the ../Makefile in $(call descend,)
+  ifneq ($(subdir),)
+OUTPUT:= $(OUTPUT)/$(subdir)
+  endif
+endif
+
+ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),)
+  OUTPUT := $(OUTPUT)/
 endif
 
-LIBTRACEEVENT_MIN_VERSION = 1.5
-LIBTRACEFS_MIN_VERSION = 1.3
+RV := $(OUTPUT)rv
+RV_IN  := $(RV)-in.o
 
-.PHONY:all warnings show_warnings
-all:   warnings rv
+VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | 
grep

[PATCH V3 2/3] tools/rtla: Use tools/build makefiles to build rtla

2024-03-15 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build rtla, inheriting the benefits of
it. For example, having a proper way to handle dependencies.

rtla is built using perf infra-structure when building inside the
kernel tree.

At this point, rtla diverges from perf in two points: Documentation
and tarball generation/build.

At the documentation level, rtla is one step ahead, placing the
documentation at Documentation/tools/rtla/, using the same build
tools as kernel documentation. The idea is to move perf
documentation to the same scheme and then share the same makefiles.

rtla has a tarball target that the (old) RHEL8 uses. The tarball was
kept using a simple standalone makefile for compatibility. The
standalone makefile shares most of the code, e.g., flags, with
regular buildings.

The tarball method was set as deprecated. If necessary, we can make
a rtla tarball like perf, which includes the entire tools/build.
But this would also require changes in the user side (the directory
structure changes, and probably the deps to build the package).

Inspired on perf and objtool.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/.gitignore  |   7 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 217 +
 tools/tracing/rtla/Makefile.config |  47 ++
 tools/tracing/rtla/Makefile.rtla   |  80 +
 tools/tracing/rtla/Makefile.standalone |  26 +++
 tools/tracing/rtla/src/Build   |  11 ++
 7 files changed, 244 insertions(+), 145 deletions(-)
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/src/Build

diff --git a/tools/tracing/rtla/.gitignore b/tools/tracing/rtla/.gitignore
index e9df32419b2b..293f0dbb0ca2 100644
--- a/tools/tracing/rtla/.gitignore
+++ b/tools/tracing/rtla/.gitignore
@@ -1 +1,6 @@
-/rtla
+# SPDX-License-Identifier: GPL-2.0-only
+rtla
+rtla-static
+fixdep
+feature
+FEATURE-DUMP
diff --git a/tools/tracing/rtla/Build b/tools/tracing/rtla/Build
new file mode 100644
index ..6c9d5b36a315
--- /dev/null
+++ b/tools/tracing/rtla/Build
@@ -0,0 +1 @@
+rtla-y += src/
diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile
index afd18c678ff5..b5878be36125 100644
--- a/tools/tracing/rtla/Makefile
+++ b/tools/tracing/rtla/Makefile
@@ -1,157 +1,86 @@
-NAME   :=  rtla
-# Follow the kernel version
-VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. 
kernelversion | grep -v make)
-
-# From libtracefs:
-# Makefiles suck: This macro sets a default value of $(2) for the
-# variable named by $(1), unless the variable has been set by
-# environment or command line. This is necessary for CC and AR
-# because make sets default values, so the simpler ?= approach
-# won't work as expected.
-define allow-override
-  $(if $(or $(findstring environment,$(origin $(1))),\
-$(findstring command line,$(origin $(1,,\
-$(eval $(1) = $(2)))
-endef
-
-# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
-$(call allow-override,CC,$(CROSS_COMPILE)gcc)
-$(call allow-override,AR,$(CROSS_COMPILE)ar)
-$(call allow-override,STRIP,$(CROSS_COMPILE)strip)
-$(call allow-override,PKG_CONFIG,pkg-config)
-$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
-$(call allow-override,LDCONFIG,ldconfig)
-
-INSTALL=   install
-MKDIR  =   mkdir
-FOPTS  :=  -flto=auto -ffat-lto-objects -fexceptions 
-fstack-protector-strong \
-   -fasynchronous-unwind-tables -fstack-clash-protection
-WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
-
-ifeq ($(CC),clang)
-  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
-  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+# SPDX-License-Identifier: GPL-2.0-only
+
+ifeq ($(srctree),)
+  srctree  := $(patsubst %/,%,$(dir $(CURDIR)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
 endif
 
-TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
-
-CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS)
-LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
-LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
-
-SRC:=  $(wildcard src/*.c)
-HDR:=  $(wildcard src/*.h)
-OBJ:=  $(SRC:.c=.o)
-DIRS   :=  src
-FILES  :=  Makefile README.txt
-CEXT   :=  bz2
-TARBALL:=  $(NAME)-$(VERSION).tar.$(CEXT)
-TAROPTS:=  -cvjf $(TARBALL)
-BINDIR :=  /usr/bin
-DATADIR:=  /usr/share
-DOCDIR :=  $(DATADIR)/doc
-MANDIR :=  $(DATADIR)/man
-LICDIR :=  $(DATADIR)/licenses
-SRCTREE:=  $(o

[PATCH V3 1/3] tools/tracing: Use tools/build makefiles on latency-collector

2024-03-15 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build latency-collector, inheriting
the benefits of it. For example: Before this patch, a missing
tracefs/traceevents headers will result in fail like this:

  ~/linux/tools/tracing/latency $ make
  cc -Wall -Wextra -g -O2  -o latency-collector latency-collector.c -lpthread
  latency-collector.c:26:10: fatal error: tracefs.h: No such file or directory
 26 | #include 
|  ^~~
  compilation terminated.
  make: *** [Makefile:14: latency-collector] Error 1

Which is not that helpful. After this change it reports:

  ~/linux/tools/tracing/latency# make

  Auto-detecting system features:
  ...   libtraceevent: [ OFF ]
  ...  libtracefs: [ OFF ]

  libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel
  libtracefs is missing. Please install libtracefs-dev/libtracefs-devel
  Makefile.config:29: *** Please, check the errors above..  Stop.

This type of output is common across other tools in tools/ like perf
and objtool.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/latency/.gitignore  |   5 +-
 tools/tracing/latency/Build   |   1 +
 tools/tracing/latency/Makefile| 105 +-
 tools/tracing/latency/Makefile.config |  30 
 4 files changed, 122 insertions(+), 19 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config

diff --git a/tools/tracing/latency/.gitignore b/tools/tracing/latency/.gitignore
index 0863960761e7..2bb8e60f7fdd 100644
--- a/tools/tracing/latency/.gitignore
+++ b/tools/tracing/latency/.gitignore
@@ -1,2 +1,5 @@
-# SPDX-License-Identifier: GPL-2.0
+# SPDX-License-Identifier: GPL-2.0-only
 latency-collector
+fixdep
+feature
+FEATURE-DUMP
diff --git a/tools/tracing/latency/Build b/tools/tracing/latency/Build
new file mode 100644
index ..0ce65ea72bf9
--- /dev/null
+++ b/tools/tracing/latency/Build
@@ -0,0 +1 @@
+latency-collector-y += latency-collector.o
diff --git a/tools/tracing/latency/Makefile b/tools/tracing/latency/Makefile
index 40c4ddaf8be1..6518b03e05c7 100644
--- a/tools/tracing/latency/Makefile
+++ b/tools/tracing/latency/Makefile
@@ -1,24 +1,93 @@
-# SPDX-License-Identifier: GPL-2.0
-# Makefile for vm tools
-#
-VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null)
-VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null)
+# SPDX-License-Identifier: GPL-2.0-only
 
-TARGETS = latency-collector
-CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS)
-LDFLAGS = -lpthread $(VAR_LDLIBS)
+ifeq ($(srctree),)
+  srctree  := $(patsubst %/,%,$(dir $(CURDIR)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+endif
 
-all: $(TARGETS)
+include $(srctree)/tools/scripts/Makefile.include
 
-%: %.c
-   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+# O is an alias for OUTPUT
+OUTPUT := $(O)
 
-clean:
-   $(RM) latency-collector
+ifeq ($(OUTPUT),)
+  OUTPUT   := $(CURDIR)
+else
+  # subdir is used by the ../Makefile in $(call descend,)
+  ifneq ($(subdir),)
+OUTPUT := $(OUTPUT)/$(subdir)
+  endif
+endif
+
+ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),)
+  OUTPUT   := $(OUTPUT)/
+endif
+
+LATENCY-COLLECTOR  := $(OUTPUT)latency-collector
+LATENCY-COLLECTOR_IN   := $(LATENCY-COLLECTOR)-in.o
+
+export CC  := gcc
+export LD  := ld
+export AR  := ar
+export PKG_CONFIG := pkg-config
+
+FEATURE_TESTS  := libtraceevent
+FEATURE_TESTS  += libtracefs
+FEATURE_DISPLAY:= libtraceevent
+FEATURE_DISPLAY+= libtracefs
+
+ifeq ($(V),1)
+  Q=
+else
+  Q= @
+endif
+
+all: $(LATENCY-COLLECTOR)
+
+include $(srctree)/tools/build/Makefile.include
+
+# check for dependencies only on required targets
+NON_CONFIG_TARGETS := clean install
 
-prefix ?= /usr/local
-sbindir ?= ${prefix}/sbin
+config := 1
+ifdef MAKECMDGOALS
+ifeq ($(filter-out $(NON_CONFIG_TARGETS),$(MAKECMDGOALS)),)
+ config:= 0
+endif
+endif
 
-install: all
-   install -d $(DESTDIR)$(sbindir)
-   install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir)
+ifeq ($(config),1)
+  include $(srctree)/tools/build/Makefile.feature
+  include Makefile.config
+endif
+
+CFLAGS += $(INCLUDES) $(LIB_INCLUDES)
+
+export CFLAGS OUTPUT srctree
+
+$(LATENCY-COLLECTOR): $(LATENCY-COLLECTOR_IN)
+   $(QUIET_LINK)$(CC) $(LDFLAGS) -o $(LATENCY-COLLECTOR) 
$(LATENCY-COLLECTOR_IN) $(EXTLIBS)
+
+latency-collector.%: fixdep FORCE
+   make -f $(srctree)/tools/build/Makefile.build dir=. $@
+
+$(LATENCY-COLLECTOR_IN): fixdep FORCE
+   make $(build)=latency-collector
+
+INSTALL:= install
+MKDIR  := mkdir
+STRIP  := strip
+BINDIR := /usr/bin
+
+install:
+   @$(MKDIR) -p $(DESTDIR)$(BINDIR)
+   $(call QUIET_INSTALL,latency-collector)$(INSTALL) $(LATENCY-COLLE

[PATCH V3 0/3] tools/tracing: Use tools/build makefiles like perf

2024-03-15 Thread Daniel Bristot de Oliveira

tools/tracing and tools/verification/rv are using standalone Makefiles
and this approach has some drawbacks. For example, code duplication
and lack of proper dependency handling, making things harder for
users. 

Linus suggested using perf's build system, and it is indeed the best way to go.

This series replaces tools/tracing and tools/verification/rv makefiles
with makefiles inspired by perf and objtool that use tools/build/
infrastructure.

Thanks, Arnaldo, for the pointers via chat.

Link: 
https://lore.kernel.org/lkml/CAHk-=wjq9bjkbpi3sjn2dy5jvwpo03u9aoc6-g8anlcgq-e...@mail.gmail.com/

Changes from V2:
  - Link: https://lore.kernel.org/lkml/cover.1710512430.git.bris...@kernel.org/
  - Fix the log of the first patch, removing the '- %< ' because
some scripts interpret them as the '---' patch separator, truncating
the log.

Changes from V1:
  - Link: https://lore.kernel.org/lkml/cover.1709914259.git.bris...@kernel.org/
  - Proper handle O= and OUTPUT= flags fixing "make tools/tracing"
  - Cleanups

Daniel Bristot de Oliveira (3):
  tools/tracing: Use tools/build makefiles on latency-collector
  tools/rtla: Use tools/build makefiles to build rtla
  tools/verification: Use tools/build makefiles on rv

 tools/tracing/latency/.gitignore   |   5 +-
 tools/tracing/latency/Build|   1 +
 tools/tracing/latency/Makefile | 105 ++--
 tools/tracing/latency/Makefile.config  |  30 
 tools/tracing/rtla/.gitignore  |   7 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 217 +
 tools/tracing/rtla/Makefile.config |  47 ++
 tools/tracing/rtla/Makefile.rtla   |  80 +
 tools/tracing/rtla/Makefile.standalone |  26 +++
 tools/tracing/rtla/src/Build   |  11 ++
 tools/verification/rv/.gitignore   |   6 +
 tools/verification/rv/Build|   1 +
 tools/verification/rv/Makefile | 207 +--
 tools/verification/rv/Makefile.config  |  47 ++
 tools/verification/rv/Makefile.rv  |  51 ++
 tools/verification/rv/src/Build|   4 +
 17 files changed, 549 insertions(+), 297 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/src/Build
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build

-- 
2.44.0

Re: [PATCH V2 1/3] tools/tracing: Use tools/build makefiles on latency-collector

2024-03-15 Thread Daniel Bristot de Oliveira

On 3/15/24 15:53, Arnaldo Carvalho de Melo wrote:
> On Fri, Mar 15, 2024 at 03:48:58PM +0100, Daniel Bristot de Oliveira wrote:
>> On 3/15/24 15:24, Daniel Bristot de Oliveira wrote:
>>> Use tools/build/ makefiles to build latency-collector, inheriting
>>> the benefits of it. For example: Before this patch, a missing
>>> tracefs/traceevents headers will result in fail like this:
>>>
>>>  %< ---
>>
>> Oops, b4 is interpreting these -- as the '---' separator, and is 
>> truncating
>> the message. I will fix this in a v3.
>>
>> sorry.
> 
> Yeah, that confuses scripts, that separator.

checkpatch did not point that as an error, and git am works as well but it
indeed confuses other scripts...

Thanks, Arnaldo.
-- Daniel

Re: [PATCH V2 1/3] tools/tracing: Use tools/build makefiles on latency-collector

2024-03-15 Thread Daniel Bristot de Oliveira

On 3/15/24 15:24, Daniel Bristot de Oliveira wrote:
> Use tools/build/ makefiles to build latency-collector, inheriting
> the benefits of it. For example: Before this patch, a missing
> tracefs/traceevents headers will result in fail like this:
> 
>  %< ---

Oops, b4 is interpreting these -- as the '---' separator, and is truncating
the message. I will fix this in a v3.

sorry.

-- Daniel

[PATCH V2 3/3] tools/verification: Use tools/build makefiles on rv

2024-03-15 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build rv, inheriting the benefits of
it. For example, having a proper way to handle dependencies.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/verification/rv/.gitignore  |   6 +
 tools/verification/rv/Build   |   1 +
 tools/verification/rv/Makefile| 207 +-
 tools/verification/rv/Makefile.config |  47 ++
 tools/verification/rv/Makefile.rv |  51 +++
 tools/verification/rv/src/Build   |   4 +
 6 files changed, 183 insertions(+), 133 deletions(-)
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build

diff --git a/tools/verification/rv/.gitignore b/tools/verification/rv/.gitignore
new file mode 100644
index ..34a486585a34
--- /dev/null
+++ b/tools/verification/rv/.gitignore
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+rv
+rv-static
+fixdep
+feature
+FEATURE-DUMP
diff --git a/tools/verification/rv/Build b/tools/verification/rv/Build
new file mode 100644
index ..a44c22349d4b
--- /dev/null
+++ b/tools/verification/rv/Build
@@ -0,0 +1 @@
+rv-y += src/
diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile
index 485f8aeddbe0..411d62b3d8eb 100644
--- a/tools/verification/rv/Makefile
+++ b/tools/verification/rv/Makefile
@@ -1,146 +1,87 @@
-NAME   :=  rv
-# Follow the kernel version
-VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. 
kernelversion | grep -v make)
-
-# From libtracefs:
-# Makefiles suck: This macro sets a default value of $(2) for the
-# variable named by $(1), unless the variable has been set by
-# environment or command line. This is necessary for CC and AR
-# because make sets default values, so the simpler ?= approach
-# won't work as expected.
-define allow-override
-  $(if $(or $(findstring environment,$(origin $(1))),\
-$(findstring command line,$(origin $(1,,\
-$(eval $(1) = $(2)))
-endef
-
-# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
-$(call allow-override,CC,$(CROSS_COMPILE)gcc)
-$(call allow-override,AR,$(CROSS_COMPILE)ar)
-$(call allow-override,STRIP,$(CROSS_COMPILE)strip)
-$(call allow-override,PKG_CONFIG,pkg-config)
-$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
-$(call allow-override,LDCONFIG,ldconfig)
-
-INSTALL=   install
-MKDIR  =   mkdir
-FOPTS  :=  -flto=auto -ffat-lto-objects -fexceptions 
-fstack-protector-strong \
-   -fasynchronous-unwind-tables -fstack-clash-protection
-WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
-
-ifeq ($(CC),clang)
-  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
-  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+# SPDX-License-Identifier: GPL-2.0-only
+
+ifeq ($(srctree),)
+  srctree  := $(patsubst %/,%,$(dir $(CURDIR)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
 endif
 
-TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
-
-CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include
-LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
-LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
-
-SRC:=  $(wildcard src/*.c)
-HDR:=  $(wildcard src/*.h)
-OBJ:=  $(SRC:.c=.o)
-DIRS   :=  src
-FILES  :=  Makefile README.txt
-CEXT   :=  bz2
-TARBALL:=  $(NAME)-$(VERSION).tar.$(CEXT)
-TAROPTS:=  -cvjf $(TARBALL)
-BINDIR :=  /usr/bin
-DATADIR:=  /usr/share
-DOCDIR :=  $(DATADIR)/doc
-MANDIR :=  $(DATADIR)/man
-LICDIR :=  $(DATADIR)/licenses
-SRCTREE:=  $(or $(BUILD_SRC),$(CURDIR))
-
-# If running from the tarball, man pages are stored in the Documentation
-# dir. If running from the kernel source, man pages are stored in
-# Documentation/tools/rv/.
-ifneq ($(wildcard Documentation/.*),)
-DOCSRC =   Documentation/
+include $(srctree)/tools/scripts/Makefile.include
+
+# O is an alias for OUTPUT
+OUTPUT  := $(O)
+
+ifeq ($(OUTPUT),)
+  OUTPUT   := $(CURDIR)/
 else
-DOCSRC =   $(SRCTREE)/../../../Documentation/tools/rv/
+  # subdir is used by the ../Makefile in $(call descend,)
+  ifneq ($(subdir),)
+OUTPUT:= $(OUTPUT)/$(subdir)
+  endif
+endif
+
+ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),)
+  OUTPUT := $(OUTPUT)/
 endif
 
-LIBTRACEEVENT_MIN_VERSION = 1.5
-LIBTRACEFS_MIN_VERSION = 1.3
+RV := $(OUTPUT)rv
+RV_IN  := $(RV)-in.o
 
-.PHONY:all warnings show_warnings
-all:   warnings rv
+VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | 
grep

[PATCH V2 2/3] tools/rtla: Use tools/build makefiles to build rtla

2024-03-15 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build rtla, inheriting the benefits of
it. For example, having a proper way to handle dependencies.

rtla is built using perf infra-structure when building inside the
kernel tree.

At this point, rtla diverges from perf in two points: Documentation
and tarball generation/build.

At the documentation level, rtla is one step ahead, placing the
documentation at Documentation/tools/rtla/, using the same build
tools as kernel documentation. The idea is to move perf
documentation to the same scheme and then share the same makefiles.

rtla has a tarball target that the (old) RHEL8 uses. The tarball was
kept using a simple standalone makefile for compatibility. The
standalone makefile shares most of the code, e.g., flags, with
regular buildings.

The tarball method was set as deprecated. If necessary, we can make
a rtla tarball like perf, which includes the entire tools/build.
But this would also require changes in the user side (the directory
structure changes, and probably the deps to build the package).

Inspired on perf and objtool.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/.gitignore  |   7 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 217 +
 tools/tracing/rtla/Makefile.config |  47 ++
 tools/tracing/rtla/Makefile.rtla   |  80 +
 tools/tracing/rtla/Makefile.standalone |  26 +++
 tools/tracing/rtla/src/Build   |  11 ++
 7 files changed, 244 insertions(+), 145 deletions(-)
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/src/Build

diff --git a/tools/tracing/rtla/.gitignore b/tools/tracing/rtla/.gitignore
index e9df32419b2b..293f0dbb0ca2 100644
--- a/tools/tracing/rtla/.gitignore
+++ b/tools/tracing/rtla/.gitignore
@@ -1 +1,6 @@
-/rtla
+# SPDX-License-Identifier: GPL-2.0-only
+rtla
+rtla-static
+fixdep
+feature
+FEATURE-DUMP
diff --git a/tools/tracing/rtla/Build b/tools/tracing/rtla/Build
new file mode 100644
index ..6c9d5b36a315
--- /dev/null
+++ b/tools/tracing/rtla/Build
@@ -0,0 +1 @@
+rtla-y += src/
diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile
index afd18c678ff5..b5878be36125 100644
--- a/tools/tracing/rtla/Makefile
+++ b/tools/tracing/rtla/Makefile
@@ -1,157 +1,86 @@
-NAME   :=  rtla
-# Follow the kernel version
-VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. 
kernelversion | grep -v make)
-
-# From libtracefs:
-# Makefiles suck: This macro sets a default value of $(2) for the
-# variable named by $(1), unless the variable has been set by
-# environment or command line. This is necessary for CC and AR
-# because make sets default values, so the simpler ?= approach
-# won't work as expected.
-define allow-override
-  $(if $(or $(findstring environment,$(origin $(1))),\
-$(findstring command line,$(origin $(1,,\
-$(eval $(1) = $(2)))
-endef
-
-# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
-$(call allow-override,CC,$(CROSS_COMPILE)gcc)
-$(call allow-override,AR,$(CROSS_COMPILE)ar)
-$(call allow-override,STRIP,$(CROSS_COMPILE)strip)
-$(call allow-override,PKG_CONFIG,pkg-config)
-$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
-$(call allow-override,LDCONFIG,ldconfig)
-
-INSTALL=   install
-MKDIR  =   mkdir
-FOPTS  :=  -flto=auto -ffat-lto-objects -fexceptions 
-fstack-protector-strong \
-   -fasynchronous-unwind-tables -fstack-clash-protection
-WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
-
-ifeq ($(CC),clang)
-  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
-  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+# SPDX-License-Identifier: GPL-2.0-only
+
+ifeq ($(srctree),)
+  srctree  := $(patsubst %/,%,$(dir $(CURDIR)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
 endif
 
-TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
-
-CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS)
-LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
-LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
-
-SRC:=  $(wildcard src/*.c)
-HDR:=  $(wildcard src/*.h)
-OBJ:=  $(SRC:.c=.o)
-DIRS   :=  src
-FILES  :=  Makefile README.txt
-CEXT   :=  bz2
-TARBALL:=  $(NAME)-$(VERSION).tar.$(CEXT)
-TAROPTS:=  -cvjf $(TARBALL)
-BINDIR :=  /usr/bin
-DATADIR:=  /usr/share
-DOCDIR :=  $(DATADIR)/doc
-MANDIR :=  $(DATADIR)/man
-LICDIR :=  $(DATADIR)/licenses
-SRCTREE:=  $(o

[PATCH V2 1/3] tools/tracing: Use tools/build makefiles on latency-collector

2024-03-15 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build latency-collector, inheriting
the benefits of it. For example: Before this patch, a missing
tracefs/traceevents headers will result in fail like this:

 %< ---
~/linux/tools/tracing/latency $ make
cc -Wall -Wextra -g -O2  -o latency-collector latency-collector.c -lpthread
latency-collector.c:26:10: fatal error: tracefs.h: No such file or directory
   26 | #include 
  |  ^~~
compilation terminated.
make: *** [Makefile:14: latency-collector] Error 1
 %< ---

Which is not that helpful. After this change it reports:

 %< ---
~/linux/tools/tracing/latency# make

Auto-detecting system features:
...   libtraceevent: [ OFF ]
...  libtracefs: [ OFF ]

libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel
libtracefs is missing. Please install libtracefs-dev/libtracefs-devel
Makefile.config:29: *** Please, check the errors above..  Stop.
 %< ---

This type of output is common across other tools in tools/ like perf
and objtool.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/latency/.gitignore  |   5 +-
 tools/tracing/latency/Build   |   1 +
 tools/tracing/latency/Makefile| 105 +-
 tools/tracing/latency/Makefile.config |  30 
 4 files changed, 122 insertions(+), 19 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config

diff --git a/tools/tracing/latency/.gitignore b/tools/tracing/latency/.gitignore
index 0863960761e7..2bb8e60f7fdd 100644
--- a/tools/tracing/latency/.gitignore
+++ b/tools/tracing/latency/.gitignore
@@ -1,2 +1,5 @@
-# SPDX-License-Identifier: GPL-2.0
+# SPDX-License-Identifier: GPL-2.0-only
 latency-collector
+fixdep
+feature
+FEATURE-DUMP
diff --git a/tools/tracing/latency/Build b/tools/tracing/latency/Build
new file mode 100644
index ..0ce65ea72bf9
--- /dev/null
+++ b/tools/tracing/latency/Build
@@ -0,0 +1 @@
+latency-collector-y += latency-collector.o
diff --git a/tools/tracing/latency/Makefile b/tools/tracing/latency/Makefile
index 40c4ddaf8be1..6518b03e05c7 100644
--- a/tools/tracing/latency/Makefile
+++ b/tools/tracing/latency/Makefile
@@ -1,24 +1,93 @@
-# SPDX-License-Identifier: GPL-2.0
-# Makefile for vm tools
-#
-VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null)
-VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null)
+# SPDX-License-Identifier: GPL-2.0-only
 
-TARGETS = latency-collector
-CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS)
-LDFLAGS = -lpthread $(VAR_LDLIBS)
+ifeq ($(srctree),)
+  srctree  := $(patsubst %/,%,$(dir $(CURDIR)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+  srctree  := $(patsubst %/,%,$(dir $(srctree)))
+endif
 
-all: $(TARGETS)
+include $(srctree)/tools/scripts/Makefile.include
 
-%: %.c
-   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+# O is an alias for OUTPUT
+OUTPUT := $(O)
 
-clean:
-   $(RM) latency-collector
+ifeq ($(OUTPUT),)
+  OUTPUT   := $(CURDIR)
+else
+  # subdir is used by the ../Makefile in $(call descend,)
+  ifneq ($(subdir),)
+OUTPUT := $(OUTPUT)/$(subdir)
+  endif
+endif
+
+ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),)
+  OUTPUT   := $(OUTPUT)/
+endif
+
+LATENCY-COLLECTOR  := $(OUTPUT)latency-collector
+LATENCY-COLLECTOR_IN   := $(LATENCY-COLLECTOR)-in.o
+
+export CC  := gcc
+export LD  := ld
+export AR  := ar
+export PKG_CONFIG := pkg-config
+
+FEATURE_TESTS  := libtraceevent
+FEATURE_TESTS  += libtracefs
+FEATURE_DISPLAY:= libtraceevent
+FEATURE_DISPLAY+= libtracefs
+
+ifeq ($(V),1)
+  Q=
+else
+  Q= @
+endif
+
+all: $(LATENCY-COLLECTOR)
+
+include $(srctree)/tools/build/Makefile.include
+
+# check for dependencies only on required targets
+NON_CONFIG_TARGETS := clean install
 
-prefix ?= /usr/local
-sbindir ?= ${prefix}/sbin
+config := 1
+ifdef MAKECMDGOALS
+ifeq ($(filter-out $(NON_CONFIG_TARGETS),$(MAKECMDGOALS)),)
+ config:= 0
+endif
+endif
 
-install: all
-   install -d $(DESTDIR)$(sbindir)
-   install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir)
+ifeq ($(config),1)
+  include $(srctree)/tools/build/Makefile.feature
+  include Makefile.config
+endif
+
+CFLAGS += $(INCLUDES) $(LIB_INCLUDES)
+
+export CFLAGS OUTPUT srctree
+
+$(LATENCY-COLLECTOR): $(LATENCY-COLLECTOR_IN)
+   $(QUIET_LINK)$(CC) $(LDFLAGS) -o $(LATENCY-COLLECTOR) 
$(LATENCY-COLLECTOR_IN) $(EXTLIBS)
+
+latency-collector.%: fixdep FORCE
+   make -f $(srctree)/tools/build/Makefile.build dir=. $@
+
+$(LATENCY-COLLECTOR_IN): fixdep FORCE
+   make $(bui

[PATCH V2 0/3] tools/tracing: Use tools/build makefiles like perf

2024-03-15 Thread Daniel Bristot de Oliveira

tools/tracing and tools/verification/rv are using standalone Makefiles
and this approach has some drawbacks. For example, code duplication
and lack of proper dependency handling, making things harder for
users. 

Linus suggested using perf's build system, and it is indeed the best way to go.

This series replaces tools/tracing and tools/verification/rv makefiles
with makefiles inspired by perf and objtool that use tools/build/
infrastructure.

Thanks, Arnaldo, for the pointers via chat.

Link: 
https://lore.kernel.org/lkml/CAHk-=wjq9bjkbpi3sjn2dy5jvwpo03u9aoc6-g8anlcgq-e...@mail.gmail.com/

Changes from V1:
  - Link: https://lore.kernel.org/lkml/cover.1709914259.git.bris...@kernel.org/
  - Proper handle O= and OUTPUT= flags fixing "make tools/tracing"
  - Cleanups

Daniel Bristot de Oliveira (3):
  tools/tracing: Use tools/build makefiles on latency-collector
  tools/rtla: Use tools/build makefiles to build rtla
  tools/verification: Use tools/build makefiles on rv

 tools/tracing/latency/.gitignore   |   5 +-
 tools/tracing/latency/Build|   1 +
 tools/tracing/latency/Makefile | 105 ++--
 tools/tracing/latency/Makefile.config  |  30 
 tools/tracing/rtla/.gitignore  |   7 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 217 +
 tools/tracing/rtla/Makefile.config |  47 ++
 tools/tracing/rtla/Makefile.rtla   |  80 +
 tools/tracing/rtla/Makefile.standalone |  26 +++
 tools/tracing/rtla/src/Build   |  11 ++
 tools/verification/rv/.gitignore   |   6 +
 tools/verification/rv/Build|   1 +
 tools/verification/rv/Makefile | 207 +--
 tools/verification/rv/Makefile.config  |  47 ++
 tools/verification/rv/Makefile.rv  |  51 ++
 tools/verification/rv/src/Build|   4 +
 17 files changed, 549 insertions(+), 297 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/src/Build
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build

-- 
2.44.0

[PATCH 3/3] tools/rtla: Use tools/build makefiles to build rtla

2024-03-08 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build rtla, inheriting the benefits of
it. For example, having a proper way to handle dependencies.

rtla is built using perf infra-structure when building inside the
kernel tree.

At this point, rtla diverges from perf in two points: Documentation
and tarball generation/build.

At the documentation level, rtla is one step ahead, placing the
documentation at Documentation/tools/rtla/, using the same build
tools as kernel documentation. The idea is to move perf
documentation to the same scheme and then share the same makefiles.

rtla has a tarball target that the (old) RHEL8 uses. The tarball was
kept using a simple standalone makefile for compatibility. The
standalone makefile shares most of the code, e.g., flags, with
regular buildings.

The tarball method was set as deprecated. If necessary, we can make
a rtla tarball like perf, which includes the entire tools/build.
But this would also require changes in the user side (the directory
structure changes, and probably the deps to build the package).

Inspired on perf and objtool.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/.gitignore  |   4 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 177 +
 tools/tracing/rtla/Makefile.config |  44 ++
 tools/tracing/rtla/Makefile.rtla   |  77 +++
 tools/tracing/rtla/Makefile.standalone |  24 
 tools/tracing/rtla/src/Build   |  11 ++
 7 files changed, 193 insertions(+), 145 deletions(-)
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/src/Build

diff --git a/tools/tracing/rtla/.gitignore b/tools/tracing/rtla/.gitignore
index e9df32419b2b..9ec8a142e157 100644
--- a/tools/tracing/rtla/.gitignore
+++ b/tools/tracing/rtla/.gitignore
@@ -1 +1,3 @@
-/rtla
+rtla
+rtla-static
+FEATURE-DUMP
diff --git a/tools/tracing/rtla/Build b/tools/tracing/rtla/Build
new file mode 100644
index ..6c9d5b36a315
--- /dev/null
+++ b/tools/tracing/rtla/Build
@@ -0,0 +1 @@
+rtla-y += src/
diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile
index afd18c678ff5..1064b02bd735 100644
--- a/tools/tracing/rtla/Makefile
+++ b/tools/tracing/rtla/Makefile
@@ -1,157 +1,46 @@
-NAME   :=  rtla
-# Follow the kernel version
-VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. 
kernelversion | grep -v make)
+# SPDX-License-Identifier: GPL-2.0
+export srctree := $(abspath ../../..)
 
-# From libtracefs:
-# Makefiles suck: This macro sets a default value of $(2) for the
-# variable named by $(1), unless the variable has been set by
-# environment or command line. This is necessary for CC and AR
-# because make sets default values, so the simpler ?= approach
-# won't work as expected.
-define allow-override
-  $(if $(or $(findstring environment,$(origin $(1))),\
-$(findstring command line,$(origin $(1,,\
-$(eval $(1) = $(2)))
-endef
+VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | 
grep -v make")
+DOCSRC := ../../../Documentation/tools/rtla/
 
-# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
-$(call allow-override,CC,$(CROSS_COMPILE)gcc)
-$(call allow-override,AR,$(CROSS_COMPILE)ar)
-$(call allow-override,STRIP,$(CROSS_COMPILE)strip)
-$(call allow-override,PKG_CONFIG,pkg-config)
-$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
-$(call allow-override,LDCONFIG,ldconfig)
+FEATURE_TESTS  := libtraceevent
+FEATURE_TESTS  += libtracefs
+FEATURE_DISPLAY:= libtraceevent
+FEATURE_DISPLAY+= libtracefs
 
-INSTALL=   install
-MKDIR  =   mkdir
-FOPTS  :=  -flto=auto -ffat-lto-objects -fexceptions 
-fstack-protector-strong \
-   -fasynchronous-unwind-tables -fstack-clash-protection
-WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
-
-ifeq ($(CC),clang)
-  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
-  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
-endif
-
-TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
-
-CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS)
-LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
-LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
-
-SRC:=  $(wildcard src/*.c)
-HDR:=  $(wildcard src/*.h)
-OBJ:=  $(SRC:.c=.o)
-DIRS   :=  src
-FILES  :=  Makefile README.txt
-CEXT   :=  bz2
-TARBALL:=  $(NAME)-$(VERSION).tar.$(CEXT)
-TAROPTS:=  -cvjf $(TARBALL)
-BINDIR :=  /usr/bin
-DATADIR:=  /usr/share
-DOCDIR :=  $(DATADIR)/doc
-MANDIR

[PATCH 2/3] tools/verification: Use tools/build makefiles on rv

2024-03-08 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build rv, inheriting the benefits of
it. For example, having a proper way to handle dependencies.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/verification/rv/.gitignore  |   2 +
 tools/verification/rv/Build   |   1 +
 tools/verification/rv/Makefile| 166 +-
 tools/verification/rv/Makefile.config |  44 +++
 tools/verification/rv/Makefile.rv |  49 
 tools/verification/rv/src/Build   |   4 +
 6 files changed, 133 insertions(+), 133 deletions(-)
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build

diff --git a/tools/verification/rv/.gitignore b/tools/verification/rv/.gitignore
new file mode 100644
index ..a446aca35eb2
--- /dev/null
+++ b/tools/verification/rv/.gitignore
@@ -0,0 +1,2 @@
+rv
+FEATURE-DUMP
diff --git a/tools/verification/rv/Build b/tools/verification/rv/Build
new file mode 100644
index ..a44c22349d4b
--- /dev/null
+++ b/tools/verification/rv/Build
@@ -0,0 +1 @@
+rv-y += src/
diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile
index 485f8aeddbe0..da16b7c4db0f 100644
--- a/tools/verification/rv/Makefile
+++ b/tools/verification/rv/Makefile
@@ -1,146 +1,46 @@
-NAME   :=  rv
-# Follow the kernel version
-VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. 
kernelversion | grep -v make)
+# SPDX-License-Identifier: GPL-2.0
+export srctree := $(abspath ../../..)
 
-# From libtracefs:
-# Makefiles suck: This macro sets a default value of $(2) for the
-# variable named by $(1), unless the variable has been set by
-# environment or command line. This is necessary for CC and AR
-# because make sets default values, so the simpler ?= approach
-# won't work as expected.
-define allow-override
-  $(if $(or $(findstring environment,$(origin $(1))),\
-$(findstring command line,$(origin $(1,,\
-$(eval $(1) = $(2)))
-endef
+VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | 
grep -v make")
+DOCSRC := ../../../Documentation/tools/rv/
 
-# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
-$(call allow-override,CC,$(CROSS_COMPILE)gcc)
-$(call allow-override,AR,$(CROSS_COMPILE)ar)
-$(call allow-override,STRIP,$(CROSS_COMPILE)strip)
-$(call allow-override,PKG_CONFIG,pkg-config)
-$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
-$(call allow-override,LDCONFIG,ldconfig)
+FEATURE_TESTS  := libtraceevent
+FEATURE_TESTS  += libtracefs
+FEATURE_DISPLAY:= libtraceevent
+FEATURE_DISPLAY+= libtracefs
 
-INSTALL=   install
-MKDIR  =   mkdir
-FOPTS  :=  -flto=auto -ffat-lto-objects -fexceptions 
-fstack-protector-strong \
-   -fasynchronous-unwind-tables -fstack-clash-protection
-WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
-
-ifeq ($(CC),clang)
-  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
-  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
-endif
-
-TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
-
-CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include
-LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
-LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
-
-SRC:=  $(wildcard src/*.c)
-HDR:=  $(wildcard src/*.h)
-OBJ:=  $(SRC:.c=.o)
-DIRS   :=  src
-FILES  :=  Makefile README.txt
-CEXT   :=  bz2
-TARBALL:=  $(NAME)-$(VERSION).tar.$(CEXT)
-TAROPTS:=  -cvjf $(TARBALL)
-BINDIR :=  /usr/bin
-DATADIR:=  /usr/share
-DOCDIR :=  $(DATADIR)/doc
-MANDIR :=  $(DATADIR)/man
-LICDIR :=  $(DATADIR)/licenses
-SRCTREE:=  $(or $(BUILD_SRC),$(CURDIR))
-
-# If running from the tarball, man pages are stored in the Documentation
-# dir. If running from the kernel source, man pages are stored in
-# Documentation/tools/rv/.
-ifneq ($(wildcard Documentation/.*),)
-DOCSRC =   Documentation/
+ifeq ($(V),1)
+  Q =
 else
-DOCSRC =   $(SRCTREE)/../../../Documentation/tools/rv/
-endif
-
-LIBTRACEEVENT_MIN_VERSION = 1.5
-LIBTRACEFS_MIN_VERSION = 1.3
-
-.PHONY:all warnings show_warnings
-all:   warnings rv
-
-TEST_LIBTRACEEVENT = $(shell sh -c "$(PKG_CONFIG) --atleast-version 
$(LIBTRACEEVENT_MIN_VERSION) libtraceevent > /dev/null 2>&1 || echo n")
-ifeq ("$(TEST_LIBTRACEEVENT)", "n")
-WARNINGS = show_warnings
-MISSING_LIBS += echo "**   libtraceevent version $(LIBTRACEEVENT_MIN_VERSION) 
or higher";
-MISSING_PACKAGES += "libtraceevent-devel"
-MISSING_

[PATCH 1/3] tools/tracing: Use tools/build makefiles on latency-collector

2024-03-08 Thread Daniel Bristot de Oliveira

Use tools/build/ makefiles to build latency-collector, inheriting
the benefits of it. For example, having a proper way to
handle dependencies.

Inspired on perf and objtool.

Suggested-by: Linus Torvalds 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/latency/.gitignore  |  1 +
 tools/tracing/latency/Build   |  1 +
 tools/tracing/latency/Makefile| 48 +--
 tools/tracing/latency/Makefile.config | 28 
 4 files changed, 61 insertions(+), 17 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config

diff --git a/tools/tracing/latency/.gitignore b/tools/tracing/latency/.gitignore
index 0863960761e7..11490d9da0b3 100644
--- a/tools/tracing/latency/.gitignore
+++ b/tools/tracing/latency/.gitignore
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
 latency-collector
+FEATURE-DUMP
diff --git a/tools/tracing/latency/Build b/tools/tracing/latency/Build
new file mode 100644
index ..0ce65ea72bf9
--- /dev/null
+++ b/tools/tracing/latency/Build
@@ -0,0 +1 @@
+latency-collector-y += latency-collector.o
diff --git a/tools/tracing/latency/Makefile b/tools/tracing/latency/Makefile
index 40c4ddaf8be1..3a2b5e8f5997 100644
--- a/tools/tracing/latency/Makefile
+++ b/tools/tracing/latency/Makefile
@@ -1,24 +1,38 @@
 # SPDX-License-Identifier: GPL-2.0
-# Makefile for vm tools
-#
-VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null)
-VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null)
+export srctree := $(abspath ../../..)
+export CC  := gcc
+export LD  := ld
+export AR  := ar
+export PKG_CONFIG := pkg-config
 
-TARGETS = latency-collector
-CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS)
-LDFLAGS = -lpthread $(VAR_LDLIBS)
+FEATURE_TESTS := libtraceevent
+FEATURE_TESTS += libtracefs
+FEATURE_DISPLAY := libtraceevent
+FEATURE_DISPLAY += libtracefs
 
-all: $(TARGETS)
+latency-collector:
 
-%: %.c
-   $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
+include $(srctree)/tools/build/Makefile.include
+include $(srctree)/tools/build/Makefile.feature
+include $(srctree)/tools/scripts/Makefile.include
+include Makefile.config
 
-clean:
-   $(RM) latency-collector
+CFLAGS += $(INCLUDES) $(LIB_INCLUDES)
+
+export CFLAGS := $(CFLAGS)
+
+latency-collector: latency-collector-in.o
+   $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $^ $(EXTLIBS)
 
-prefix ?= /usr/local
-sbindir ?= ${prefix}/sbin
+latency-collector.%: fixdep FORCE
+   make -f $(srctree)/tools/build/Makefile.build dir=. $@
+
+latency-collector-in.o: fixdep FORCE
+   make $(build)=latency-collector
+
+clean:
+   $(call QUIET_CLEAN, latency-collector)
+   @find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' 
-delete
+   @rm -f latency-collector FEATURE-DUMP
 
-install: all
-   install -d $(DESTDIR)$(sbindir)
-   install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir)
+.PHONY: FORCE clean
diff --git a/tools/tracing/latency/Makefile.config 
b/tools/tracing/latency/Makefile.config
new file mode 100644
index ..5100d89346ab
--- /dev/null
+++ b/tools/tracing/latency/Makefile.config
@@ -0,0 +1,28 @@
+STOP_ERROR :=
+
+define lib_setup
+  $(eval EXTLIBS += -l$(1))
+  $(eval LIB_INCLUDES += $(shell sh -c "$(PKG_CONFIG) --cflags lib$(1)"))
+endef
+
+$(call feature_check,libtraceevent)
+ifeq ($(feature-libtraceevent), 1)
+  $(call detected,CONFIG_LIBTRACEEVENT)
+  $(call lib_setup,traceevent)
+else
+  STOP_ERROR := 1
+  $(info libtraceevent is missing. Please install 
libtraceevent-dev/libtraceevent-devel)
+endif
+
+$(call feature_check,libtracefs)
+ifeq ($(feature-libtracefs), 1)
+  $(call detected,CONFIG_LIBTRACEFS)
+  $(call lib_setup,tracefs)
+else
+  STOP_ERROR := 1
+  $(info libtracefs is missing. Please install libtracefs-dev/libtracefs-devel)
+endif
+
+ifeq ($(STOP_ERROR),1)
+  $(error Please, check the errors above.)
+endif
-- 
2.44.0

[PATCH 0/3] tools/tracing: Use tools/build makefiles like perf

2024-03-08 Thread Daniel Bristot de Oliveira

tools/tracing and tools/verification/rv are using standalone Makefiles.
However, This approach has some drawbacks. For example, code
duplication and lack of proper dependency handling, making things
harder for users. 

Linus suggested using perf's build system, and it is indeed the best way to go.

This series replaces tools/tracing and tools/verification/rv makefiles
with makefiles inspired by perf and objtool that use tools/build/
infrastructure.

Thanks, Arnaldo, for the pointers via chat.

Link: 
https://lore.kernel.org/lkml/CAHk-=wjq9bjkbpi3sjn2dy5jvwpo03u9aoc6-g8anlcgq-e...@mail.gmail.com/

Daniel Bristot de Oliveira (3):
  tools/tracing: Use tools/build makefiles on latency-collector
  tools/verification: Use tools/build makefiles on rv
  tools/rtla: Use tools/build makefiles to build rtla

 tools/tracing/latency/.gitignore   |   1 +
 tools/tracing/latency/Build|   1 +
 tools/tracing/latency/Makefile |  48 ---
 tools/tracing/latency/Makefile.config  |  28 
 tools/tracing/rtla/.gitignore  |   4 +-
 tools/tracing/rtla/Build   |   1 +
 tools/tracing/rtla/Makefile| 177 +
 tools/tracing/rtla/Makefile.config |  44 ++
 tools/tracing/rtla/Makefile.rtla   |  77 +++
 tools/tracing/rtla/Makefile.standalone |  24 
 tools/tracing/rtla/src/Build   |  11 ++
 tools/verification/rv/.gitignore   |   2 +
 tools/verification/rv/Build|   1 +
 tools/verification/rv/Makefile | 166 +--
 tools/verification/rv/Makefile.config  |  44 ++
 tools/verification/rv/Makefile.rv  |  49 +++
 tools/verification/rv/src/Build|   4 +
 17 files changed, 387 insertions(+), 295 deletions(-)
 create mode 100644 tools/tracing/latency/Build
 create mode 100644 tools/tracing/latency/Makefile.config
 create mode 100644 tools/tracing/rtla/Build
 create mode 100644 tools/tracing/rtla/Makefile.config
 create mode 100644 tools/tracing/rtla/Makefile.rtla
 create mode 100644 tools/tracing/rtla/Makefile.standalone
 create mode 100644 tools/tracing/rtla/src/Build
 create mode 100644 tools/verification/rv/.gitignore
 create mode 100644 tools/verification/rv/Build
 create mode 100644 tools/verification/rv/Makefile.config
 create mode 100644 tools/verification/rv/Makefile.rv
 create mode 100644 tools/verification/rv/src/Build

-- 
2.44.0

[GIT PULL] tracing/tools: Fixes for 6.8-rc4

2024-02-12 Thread Daniel Bristot de Oliveira

Steven,

Tracing tooling updates for 6.8-rc4

RTLA:
- rtla tools are exiting with a positive value  when usage()
  is called. Make them return 0 if the usage was called via
  -h/--help.

- the -P priority sets the sched priority for rtla workload.
  When the SCHED_OTHER scheduler is selected, it sets
  the rt_priority instead of the nice parameter. Setting
  the nice value is the correct thing, so fix it.

- rtla is failing to compile with clang due to unsupported
  options from gcc. Adjusting the compiler/linker options
  makes clang work properly.

- Remove the sched_getattr() unused function on utils.c.

- Fixes on variable initialization and size, reported by
  clang.

Verification:
- rv is failing to compile with clang due to unsupported
  options from gcc. Adjusting the compiler/linker options
  makes clang work properly.

- Fix an uninitialized variable on in_kernel.c reported by
  clang.


Please pull the latest trace-tools-fixes-v6.8-rc4 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git
trace-tools-fixes-v6.8-rc4

Tag SHA1: 63f0522db203c7d138595e760b9a237bc02990fa
Head SHA1: b5f319360371087d52070d8f3fc7789e80ce69a6

Daniel Bristot de Oliveira (6):
  tools/rtla: Fix Makefile compiler options for clang
  tools/rtla: Fix uninitialized bucket/data->bucket_size warning
  tools/rtla: Fix clang warning about mount_point var size
  tools/rtla: Remove unused sched_getattr() function
  tools/rv: Fix Makefile compiler options for clang
  tools/rv: Fix curr_reactor uninitialized variable

John Kacur (1):
  tools/rtla: Exit with EXIT_SUCCESS when help is invoked

limingming3 (1):
  tools/rtla: Replace setting prio with nice for SCHED_OTHER


 tools/tracing/rtla/Makefile|  7 ++-
 tools/tracing/rtla/src/osnoise_hist.c  |  9 ++---
 tools/tracing/rtla/src/osnoise_top.c   |  6 +-
 tools/tracing/rtla/src/timerlat_hist.c |  9 ++---
 tools/tracing/rtla/src/timerlat_top.c  |  6 +-
 tools/tracing/rtla/src/utils.c | 14 --
 tools/tracing/rtla/src/utils.h |  2 ++
 tools/verification/rv/Makefile |  7 ++-
 tools/verification/rv/src/in_kernel.c  |  2 +-
 9 files changed, 41 insertions(+), 21 deletions(-)
---
diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile
index 2456a399eb9a..afd18c678ff5 100644
--- a/tools/tracing/rtla/Makefile
+++ b/tools/tracing/rtla/Makefile
@@ -28,10 +28,15 @@ FOPTS   :=  -flto=auto -ffat-lto-objects 
-fexceptions -fstack-protector-strong \
-fasynchronous-unwind-tables -fstack-clash-protection
 WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
 
+ifeq ($(CC),clang)
+  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
+  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+endif
+
 TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
 
 CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS)
-LDFLAGS:=  -ggdb $(EXTRA_LDFLAGS)
+LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
 LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
 
 SRC:=  $(wildcard src/*.c)
diff --git a/tools/tracing/rtla/src/osnoise_hist.c 
b/tools/tracing/rtla/src/osnoise_hist.c
index 8f81fa007364..01870d50942a 100644
--- a/tools/tracing/rtla/src/osnoise_hist.c
+++ b/tools/tracing/rtla/src/osnoise_hist.c
@@ -135,8 +135,7 @@ static void osnoise_hist_update_multiple(struct 
osnoise_tool *tool, int cpu,
if (params->output_divisor)
duration = duration / params->output_divisor;
 
-   if (data->bucket_size)
-   bucket = duration / data->bucket_size;
+   bucket = duration / data->bucket_size;
 
total_duration = duration * count;
 
@@ -480,7 +479,11 @@ static void osnoise_hist_usage(char *usage)
 
for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]);
-   exit(1);
+
+   if (usage)
+   exit(EXIT_FAILURE);
+
+   exit(EXIT_SUCCESS);
 }
 
 /*
diff --git a/tools/tracing/rtla/src/osnoise_top.c 
b/tools/tracing/rtla/src/osnoise_top.c
index f7c959be8677..457360db0767 100644
--- a/tools/tracing/rtla/src/osnoise_top.c
+++ b/tools/tracing/rtla/src/osnoise_top.c
@@ -331,7 +331,11 @@ static void osnoise_top_usage(struct osnoise_top_params 
*params, char *usage)
 
for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]);
-   exit(1);
+
+   if (usage)
+   exit(EXIT_FAILURE);
+
+   exit(EXIT_SUCCESS);
 }
 
 /*
diff --git a/tools/tracing/rtla/src/timerlat_hist.c 
b/tools/tracing/rtla/src/timerlat_hist.c
index 47d3d8b53cb2..

Re: [PATCH] tools/rtla: Replace setting prio with nice for SCHED_OTHER

2024-02-08 Thread Daniel Bristot de Oliveira

On 2/7/24 07:51, limingming3 wrote:
> Since the sched_priority for SCHED_OTHER is always 0, it makes no
> sence to set it.
> Setting nice for SCHED_OTHER seems more meaningful.

Thanks!

This is actually a fix, I meant to set nice since the beginning.

-- Daniel

Re: [PATCH 1/6] tools/rtla: Fix Makefile compiler options for clang

2024-02-08 Thread Daniel Bristot de Oliveira

On 2/6/24 16:48, Nathan Chancellor wrote:
> On Tue, Feb 06, 2024 at 12:05:29PM +0100, Daniel Bristot de Oliveira wrote:
>> The following errors are showing up when compiling rtla with clang:
>>
>>  $ make HOSTCC=clang CC=clang LLVM_IAS=1
>>  [...]
>>
>>   clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects
>>  -fexceptions -fstack-protector-strong
>>  -fasynchronous-unwind-tables -fstack-clash-protection  -Wall
>>  -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
>>  -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
>>  $(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c
>>
>>   clang: warning: optimization flag '-ffat-lto-objects' is not supported 
>> [-Wignored-optimization-argument]
> 
> For what it's worth, this flag is supported in clang 17.0.0 and newer:
> 
> https://github.com/llvm/llvm-project/commit/610fc5cbcc8b68879c562f6458608afe2473ab7f

Good! still, I am getting this error on fedora, with this clang version:

bristot@x1:~/src/git/linux/tools/tracing/rtla$ clang --version
clang version 17.0.6 (Fedora 17.0.6-1.fc39)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

:-(

> But if it is not critical, just dropping the flag like you have done
> here rather than conditionally supporting it is probably easier.

Yeah, I will drop it for now, and keep monitoring.

Thanks Natan!
-- Daniel

[PATCH 6/6] tools/rv: Fix curr_reactor uninitialized variable

2024-02-06 Thread Daniel Bristot de Oliveira

clang is reporting:

$ make HOSTCC=clang CC=clang LLVM_IAS=1

clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
-fstack-protector-strong -fasynchronous-unwind-tables
-fstack-clash-protection  -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
$(pkg-config --cflags libtracefs)  -I include
-c -o src/in_kernel.o src/in_kernel.c
[...]

src/in_kernel.c:227:6: warning: variable 'curr_reactor' is used uninitialized 
whenever 'if' condition is true [-Wsometimes-uninitialized]
  227 | if (!end)
  | ^~~~
src/in_kernel.c:242:9: note: uninitialized use occurs here
  242 | return curr_reactor;
  |^~~~
src/in_kernel.c:227:2: note: remove the 'if' if its condition is always false
  227 | if (!end)
  | ^
  228 | goto out_free;
  | ~
src/in_kernel.c:221:6: warning: variable 'curr_reactor' is used uninitialized 
whenever 'if' condition is true [-Wsometimes-uninitialized]
  221 | if (!start)
  | ^~
src/in_kernel.c:242:9: note: uninitialized use occurs here
  242 | return curr_reactor;
  |^~~~
src/in_kernel.c:221:2: note: remove the 'if' if its condition is always false
  221 | if (!start)
  | ^~~
  222 | goto out_free;
  | ~
src/in_kernel.c:215:20: note: initialize the variable 'curr_reactor' to silence 
this warning
  215 | char *curr_reactor;
  |   ^
  |= NULL
2 warnings generated.

Which is correct. Setting curr_reactor to NULL avoids the problem.

Cc: sta...@vger.kernel.org
Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface")
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/verification/rv/src/in_kernel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/verification/rv/src/in_kernel.c 
b/tools/verification/rv/src/in_kernel.c
index ad28582bcf2b..f04479ecc96c 100644
--- a/tools/verification/rv/src/in_kernel.c
+++ b/tools/verification/rv/src/in_kernel.c
@@ -210,9 +210,9 @@ static char *ikm_read_reactor(char *monitor_name)
 static char *ikm_get_current_reactor(char *monitor_name)
 {
char *reactors = ikm_read_reactor(monitor_name);
+   char *curr_reactor = NULL;
char *start;
char *end;
-   char *curr_reactor;
 
if (!reactors)
return NULL;
-- 
2.43.0

[PATCH 5/6] tools/rv: Fix Makefile compiler options for clang

2024-02-06 Thread Daniel Bristot de Oliveira

The following errors are showing up when compiling rv with clang:

 $ make HOSTCC=clang CC=clang LLVM_IAS=1
 [...]
  clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects
  -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables
  -fstack-clash-protection  -Wall -Werror=format-security
  -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
  -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs)
  -I include   -c -o src/utils.o src/utils.c
  clang: warning: optimization flag '-ffat-lto-objects' is not supported 
[-Wignored-optimization-argument]
  warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean 
'-Wno-uninitialized'? [-Wunknown-warning-option]
  1 warning generated.

  clang -o rv -ggdb  src/in_kernel.o src/rv.o src/trace.o src/utils.o 
$(pkg-config --libs libtracefs)
  src/in_kernel.o: file not recognized: file format not recognized
  clang: error: linker command failed with exit code 1 (use -v to see 
invocation)
  make: *** [Makefile:110: rv] Error 1

Solve these issues by:
  - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang
  - informing the linker about -flto=auto

Cc: sta...@vger.kernel.org
Fixes: 4bc4b131d44c ("rv: Add rv tool")
Suggested-by: Donald Zickus 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/verification/rv/Makefile | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile
index 3d0f3888a58c..485f8aeddbe0 100644
--- a/tools/verification/rv/Makefile
+++ b/tools/verification/rv/Makefile
@@ -28,10 +28,15 @@ FOPTS   :=  -flto=auto -ffat-lto-objects 
-fexceptions -fstack-protector-strong \
-fasynchronous-unwind-tables -fstack-clash-protection
 WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
 
+ifeq ($(CC),clang)
+  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
+  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+endif
+
 TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
 
 CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include
-LDFLAGS:=  -ggdb $(EXTRA_LDFLAGS)
+LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
 LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
 
 SRC:=  $(wildcard src/*.c)
-- 
2.43.0

[PATCH 4/6] tools/rtla: Remove unused sched_getattr() function

2024-02-06 Thread Daniel Bristot de Oliveira

Clang is reporting:

$ make HOSTCC=clang CC=clang LLVM_IAS=1
[...]
clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions 
-fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection  
-Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS 
$(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c
src/utils.c:241:19: warning: unused function 'sched_getattr' [-Wunused-function]
  241 | static inline int sched_getattr(pid_t pid, struct sched_attr *attr,
  |   ^
1 warning generated.

Which is correct, so remove the unused function.

Cc: sta...@vger.kernel.org
Fixes: b1696371d865 ("rtla: Helper functions for rtla")
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/src/utils.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index b998b24edf1e..5fcd6495ff05 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -238,12 +238,6 @@ static inline int sched_setattr(pid_t pid, const struct 
sched_attr *attr,
return syscall(__NR_sched_setattr, pid, attr, flags);
 }
 
-static inline int sched_getattr(pid_t pid, struct sched_attr *attr,
-   unsigned int size, unsigned int flags)
-{
-   return syscall(__NR_sched_getattr, pid, attr, size, flags);
-}
-
 int __set_sched_attr(int pid, struct sched_attr *attr)
 {
int flags = 0;
-- 
2.43.0

[PATCH 3/6] tools/rtla: Fix clang warning about mount_point var size

2024-02-06 Thread Daniel Bristot de Oliveira

clang is reporting this warning:

$ make HOSTCC=clang CC=clang LLVM_IAS=1
[...]
clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
-fstack-protector-strong -fasynchronous-unwind-tables
-fstack-clash-protection  -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
$(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c

src/utils.c:548:66: warning: 'fscanf' may overflow; destination buffer in 
argument 3 has size 1024, but the corresponding specifier may require size 1025 
[-Wfortify-source]
  548 | while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", 
mount_point, type) == 2) {
  | 
^

Increase mount_point variable size to MAX_PATH+1 to avoid the overflow.

Cc: sta...@vger.kernel.org
Fixes: a957cbc02531 ("rtla: Add -C cgroup support")
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/src/utils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index c769d7b3842c..b998b24edf1e 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -536,7 +536,7 @@ int set_cpu_dma_latency(int32_t latency)
  */
 static const int find_mount(const char *fs, char *mp, int sizeof_mp)
 {
-   char mount_point[MAX_PATH];
+   char mount_point[MAX_PATH+1];
char type[100];
int found = 0;
FILE *fp;
-- 
2.43.0

[PATCH 2/6] tools/rtla: Fix uninitialized bucket/data->bucket_size warning

2024-02-06 Thread Daniel Bristot de Oliveira

When compiling rtla with clang, I am getting the following warnings:

$ make HOSTCC=clang CC=clang LLVM_IAS=1

[..]
clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
-fstack-protector-strong -fasynchronous-unwind-tables
-fstack-clash-protection  -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
$(pkg-config --cflags libtracefs)
-c -o src/osnoise_hist.o src/osnoise_hist.c
src/osnoise_hist.c:138:6: warning: variable 'bucket' is used uninitialized 
whenever 'if' condition is false [-Wsometimes-uninitialized]
  138 | if (data->bucket_size)
  | ^
src/osnoise_hist.c:149:6: note: uninitialized use occurs here
  149 | if (bucket < entries)
  | ^~
src/osnoise_hist.c:138:2: note: remove the 'if' if its condition is always true
  138 | if (data->bucket_size)
  | ^~
  139 | bucket = duration / data->bucket_size;
src/osnoise_hist.c:132:12: note: initialize the variable 'bucket' to silence 
this warning
  132 | int bucket;
  |   ^
  |= 0
1 warning generated.

[...]

clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
-fstack-protector-strong -fasynchronous-unwind-tables
-fstack-clash-protection  -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
$(pkg-config --cflags libtracefs)
-c -o src/timerlat_hist.o src/timerlat_hist.c
src/timerlat_hist.c:181:6: warning: variable 'bucket' is used uninitialized 
whenever 'if' condition is false [-Wsometimes-uninitialized]
  181 | if (data->bucket_size)
  | ^
src/timerlat_hist.c:204:6: note: uninitialized use occurs here
  204 | if (bucket < entries)
  | ^~
src/timerlat_hist.c:181:2: note: remove the 'if' if its condition is always true
  181 | if (data->bucket_size)
  | ^~
  182 | bucket = latency / data->bucket_size;
src/timerlat_hist.c:175:12: note: initialize the variable 'bucket' to silence 
this warning
  175 | int bucket;
  |   ^
  |= 0
1 warning generated.

This is a legit warning, but data->bucket_size is always > 0 (see
timerlat_hist_parse_args()), so the if is not necessary.

Remove the unneeded if (data->bucket_size) to avoid the warning.

Cc: sta...@vger.kernel.org
Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode")
Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode")
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/src/osnoise_hist.c  | 3 +--
 tools/tracing/rtla/src/timerlat_hist.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/tracing/rtla/src/osnoise_hist.c 
b/tools/tracing/rtla/src/osnoise_hist.c
index 8f81fa007364..67128171c29d 100644
--- a/tools/tracing/rtla/src/osnoise_hist.c
+++ b/tools/tracing/rtla/src/osnoise_hist.c
@@ -135,8 +135,7 @@ static void osnoise_hist_update_multiple(struct 
osnoise_tool *tool, int cpu,
if (params->output_divisor)
duration = duration / params->output_divisor;
 
-   if (data->bucket_size)
-   bucket = duration / data->bucket_size;
+   bucket = duration / data->bucket_size;
 
total_duration = duration * count;
 
diff --git a/tools/tracing/rtla/src/timerlat_hist.c 
b/tools/tracing/rtla/src/timerlat_hist.c
index 47d3d8b53cb2..3a5b8c409e7d 100644
--- a/tools/tracing/rtla/src/timerlat_hist.c
+++ b/tools/tracing/rtla/src/timerlat_hist.c
@@ -178,8 +178,7 @@ timerlat_hist_update(struct osnoise_tool *tool, int cpu,
if (params->output_divisor)
latency = latency / params->output_divisor;
 
-   if (data->bucket_size)
-   bucket = latency / data->bucket_size;
+   bucket = latency / data->bucket_size;
 
if (!context) {
hist = data->hist[cpu].irq;
-- 
2.43.0

[PATCH 1/6] tools/rtla: Fix Makefile compiler options for clang

2024-02-06 Thread Daniel Bristot de Oliveira

The following errors are showing up when compiling rtla with clang:

 $ make HOSTCC=clang CC=clang LLVM_IAS=1
 [...]

  clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects
-fexceptions -fstack-protector-strong
-fasynchronous-unwind-tables -fstack-clash-protection  -Wall
-Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
$(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c

  clang: warning: optimization flag '-ffat-lto-objects' is not supported 
[-Wignored-optimization-argument]
  warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean 
'-Wno-uninitialized'? [-Wunknown-warning-option]
  1 warning generated.

  clang -o rtla -ggdb  src/osnoise.o src/osnoise_hist.o src/osnoise_top.o
  src/rtla.o src/timerlat_aa.o src/timerlat.o src/timerlat_hist.o
  src/timerlat_top.o src/timerlat_u.o src/trace.o src/utils.o $(pkg-config 
--libs libtracefs)

  src/osnoise.o: file not recognized: file format not recognized
  clang: error: linker command failed with exit code 1 (use -v to see 
invocation)
  make: *** [Makefile:110: rtla] Error 1

Solve these issues by:
  - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang
  - informing the linker about -flto=auto

Cc: sta...@vger.kernel.org
Fixes: 1a7b22ab15eb ("tools/rtla: Build with EXTRA_{C,LD}FLAGS")
Suggested-by: Donald Zickus 
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/Makefile | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile
index 2456a399eb9a..afd18c678ff5 100644
--- a/tools/tracing/rtla/Makefile
+++ b/tools/tracing/rtla/Makefile
@@ -28,10 +28,15 @@ FOPTS   :=  -flto=auto -ffat-lto-objects 
-fexceptions -fstack-protector-strong \
-fasynchronous-unwind-tables -fstack-clash-protection
 WOPTS  :=  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
 
+ifeq ($(CC),clang)
+  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
+  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+endif
+
 TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs)
 
 CFLAGS :=  -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) 
$(TRACEFS_HEADERS) $(EXTRA_CFLAGS)
-LDFLAGS:=  -ggdb $(EXTRA_LDFLAGS)
+LDFLAGS:=  -flto=auto -ggdb $(EXTRA_LDFLAGS)
 LIBS   :=  $$($(PKG_CONFIG) --libs libtracefs)
 
 SRC:=  $(wildcard src/*.c)
-- 
2.43.0

[PATCH 0/6] tools: Fix rtla and rv problems (found) with clang

2024-02-06 Thread Daniel Bristot de Oliveira

RHEL people reported some errors when compiling rtla and rv with
clang. The command line used to compile the tools is:

$ make HOSTCC=clang CC=clang LLVM_IAS=1

The first problem is two unsupported flags passed to the compiler:
-ffat-lto-objects and -Wno-maybe-uninitialized. They will be
removed if the compile is clang.

Also, the clang linker does not automatically recognize the
-flto=auto option used at compilation time, so it is explicitly
set.

With the compiler working, it starts pointing to some warnings
and errors about uninitialized variables, variable size, and an
unused function. These problems are also fixed.


Daniel Bristot de Oliveira (6):
  tools/rtla: Fix Makefile compiler options for clang
  tools/rtla: Fix uninitialized bucket/data->bucket_size warning
  tools/rtla: Fix clang warning about mount_point var size
  tools/rtla: Remove unused sched_getattr() function
  tools/rv: Fix Makefile compiler options for clang
  tools/rv: Fix curr_reactor uninitialized variable

 tools/tracing/rtla/Makefile| 7 ++-
 tools/tracing/rtla/src/osnoise_hist.c  | 3 +--
 tools/tracing/rtla/src/timerlat_hist.c | 3 +--
 tools/tracing/rtla/src/utils.c | 8 +---
 tools/verification/rv/Makefile | 7 ++-
 tools/verification/rv/src/in_kernel.c  | 2 +-
 6 files changed, 16 insertions(+), 14 deletions(-)

-- 
2.43.0

Re: [PATCH] tracing/timerlat: Move hrtimer_init to timerlat_fd open()

2024-02-01 Thread Daniel Bristot de Oliveira

On 2/1/24 16:44, Greg KH wrote:
> On Thu, Feb 01, 2024 at 04:13:39PM +0100, Daniel Bristot de Oliveira wrote:
>> Currently, the timerlat's hrtimer is initialized at the first read of
>> timerlat_fd, and destroyed at close(). It works, but it causes an error
>> if the user program open() and close() the file without reading.
> 
> What error exactly happens?  Userspace, or the kernel crashes?

sorry, kernel crash:

# echo NO_OSNOISE_WORKLOAD > /sys/kernel/debug/tracing/osnoise/options
# echo timerlat > /sys/kernel/debug/tracing/current_tracer

# cat ./timerlat_load.py
#!/usr/bin/env python3

timerlat_fd = open("/sys/kernel/tracing/osnoise/per_cpu/cpu0/timerlat_fd", 'r')
timerlat_fd.close();

# ./taskset -c 0 ./timerlat_load.py


[ 6401.611374] BUG: kernel NULL pointer dereference, address: 0010
[ 6401.611786] #PF: supervisor read access in kernel mode
[ 6401.612081] #PF: error_code(0x) - not-present page
[ 6401.612376] PGD 0 P4D 0
[ 6401.612495] Oops:  [#1] PREEMPT SMP NOPTI
[ 6401.612690] CPU: 1 PID: 2673 Comm: python3 Not tainted 
6.6.13-200.fc39.x86_64 #1
[ 6401.613011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-1.fc39 04/01/2014
[ 6401.613379] RIP: 0010:hrtimer_active+0xd/0x50
[ 6401.613577] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 57 30 <8b> 42 10 
a8 01 74 09 f3 90 8b 42 10 a8 01 75 f7 80 7f 38 00 75 1d
[ 6401.614374] RSP: 0018:b031009b7e10 EFLAGS: 00010286
[ 6401.614604] RAX: 0002db00 RBX: 9118f786db08 RCX: 
[ 6401.614914] RDX:  RSI: 9117a0e64400 RDI: 9118f786db08
[ 6401.615225] RBP: 9118f786db80 R08: 9117a0ddd420 R09: 9117804d4f70
[ 6401.615534] R10:  R11:  R12: 9118f786db08
[ 6401.615846] R13: 91178fdd5e20 R14: 9117840978c0 R15: 
[ 6401.616157] FS:  7f2ffbab1740() GS:9118f784() 
knlGS:
[ 6401.616508] CS:  0010 DS:  ES:  CR0: 80050033
[ 6401.616765] CR2: 0010 CR3: 0001b402e000 CR4: 00750ee0
[ 6401.617075] PKRU: 5554
[ 6401.617197] Call Trace:
[ 6401.617309]  
[ 6401.617407]  ? __die+0x23/0x70
[ 6401.617548]  ? page_fault_oops+0x171/0x4e0
[ 6401.617983]  ? srso_alias_return_thunk+0x5/0x7f
[ 6401.618389]  ? avc_has_extended_perms+0x237/0x520
[ 6401.618800]  ? exc_page_fault+0x7f/0x180
[ 6401.619176]  ? asm_exc_page_fault+0x26/0x30
[ 6401.619563]  ? hrtimer_active+0xd/0x50
[ 6401.619926]  hrtimer_cancel+0x15/0x40
[ 6401.620286]  timerlat_fd_release+0x48/0xe0
[ 6401.620666]  __fput+0xf5/0x290
[ 6401.621004]  __x64_sys_close+0x3d/0x80
[ 6401.621370]  do_syscall_64+0x60/0x90
[ 6401.621730]  ? srso_alias_return_thunk+0x5/0x7f
[ 6401.622129]  ? __x64_sys_ioctl+0x72/0xd0
[ 6401.622503]  ? srso_alias_return_thunk+0x5/0x7f
[ 6401.622902]  ? syscall_exit_to_user_mode+0x2b/0x40
[ 6401.623309]  ? srso_alias_return_thunk+0x5/0x7f
[ 6401.623703]  ? do_syscall_64+0x6c/0x90
[ 6401.624063]  ? srso_alias_return_thunk+0x5/0x7f
[ 6401.624457]  ? exit_to_user_mode_prepare+0x142/0x1f0
[ 6401.624868]  ? srso_alias_return_thunk+0x5/0x7f
[ 6401.625262]  ? syscall_exit_to_user_mode+0x2b/0x40
[ 6401.625663]  ? srso_alias_return_thunk+0x5/0x7f
[ 6401.626051]  ? do_syscall_64+0x6c/0x90
[ 6401.626404]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 6401.626810] RIP: 0033:0x7f2ffb321594
[ 6401.627164] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 
00 00 90 f3 0f 1e fa 80 3d d5 cd 0d 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d
[ 6401.628345] RSP: 002b:7ffe8d8eef18 EFLAGS: 0202 ORIG_RAX: 
0003
[ 6401.628867] RAX: ffda RBX: 7f2ffba4e668 RCX: 7f2ffb321594
[ 6401.629372] RDX:  RSI:  RDI: 0003
[ 6401.629879] RBP: 7ffe8d8eef40 R08:  R09: 
[ 6401.630384] R10: 55c926e3167eae79 R11: 0202 R12: 0003
[ 6401.630889] R13: 7ffe8d8ef030 R14:  R15: 7f2ffba4e668
[ 6401.631394]  
[ 6401.631691] Modules linked in: tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set 
nf_tables nfnetlink qrtr sunrpc pktcdvd intel_rapl_msr snd_hda_codec_generic 
intel_rapl_common ledtrig_audio snd_hda_intel snd_intel_dspcfg 
snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm kvm_amd 
iTCO_wdt snd_timer intel_pmc_bxt ccp joydev iTCO_vendor_support kvm irqbypass 
i2c_i801 snd i2c_smbus soundcore lpc_ich virtio_balloon loop zram 
crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic 
ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_net virtio_gpu 
virtio_console vi

Re: [PATCH] tracing/timerlat: Move hrtimer_init to timerlat_fd open()

2024-02-01 Thread Daniel Bristot de Oliveira

On 2/1/24 16:25, Steven Rostedt wrote:
> On Thu,  1 Feb 2024 16:13:39 +0100
> Daniel Bristot de Oliveira  wrote:
> 
>> Currently, the timerlat's hrtimer is initialized at the first read of
>> timerlat_fd, and destroyed at close(). It works, but it causes an error
>> if the user program open() and close() the file without reading.
>>
>> Move hrtimer_init to timerlat_fd open() to avoid this problem.
>>
>> No functional changes.
> 
> It can't be fixing something and not have any functional changes.
> 
> No functional changes means the code is restructured but the resulting
> assembly would be the same.
> 
> Like moving functions around in a file so that you don't need extra
> prototype declarations.
> 
> Please only add "No functional changes" if the function's assembly would be
> the same.

ok

>>
>> Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface")
> 
> With a fixes tag, I'm assuming his should go into v6.8 with a Cc stable?

right, I added it on Cc, but did not include the Cc:.. tag. It seems that I 
should have.

-- Daniel

> -- Steve
> 
> 
>> Signed-off-by: Daniel Bristot de Oliveira 
>> ---
>>  kernel/trace/trace_osnoise.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index bd0d01d00fb9..a8e28f9b9271 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -2444,6 +2444,9 @@ static int timerlat_fd_open(struct inode *inode, 
>> struct file *file)
>>  tlat = this_cpu_tmr_var();
>>  tlat->count = 0;
>>  
>> +hrtimer_init(>timer, CLOCK_MONOTONIC, 
>> HRTIMER_MODE_ABS_PINNED_HARD);
>> +tlat->timer.function = timerlat_irq;
>> +
>>  migrate_enable();
>>  return 0;
>>  };
>> @@ -2526,9 +2529,6 @@ timerlat_fd_read(struct file *file, char __user *ubuf, 
>> size_t count,
>>  tlat->tracing_thread = false;
>>  tlat->kthread = current;
>>  
>> -hrtimer_init(>timer, CLOCK_MONOTONIC, 
>> HRTIMER_MODE_ABS_PINNED_HARD);
>> -tlat->timer.function = timerlat_irq;
>> -
>>  /* Annotate now to drift new period */
>>  tlat->abs_period = hrtimer_cb_get_time(>timer);
>>  
>

[PATCH] tracing/timerlat: Move hrtimer_init to timerlat_fd open()

2024-02-01 Thread Daniel Bristot de Oliveira

Currently, the timerlat's hrtimer is initialized at the first read of
timerlat_fd, and destroyed at close(). It works, but it causes an error
if the user program open() and close() the file without reading.

Move hrtimer_init to timerlat_fd open() to avoid this problem.

No functional changes.

Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface")
Signed-off-by: Daniel Bristot de Oliveira 
---
 kernel/trace/trace_osnoise.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index bd0d01d00fb9..a8e28f9b9271 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2444,6 +2444,9 @@ static int timerlat_fd_open(struct inode *inode, struct 
file *file)
tlat = this_cpu_tmr_var();
tlat->count = 0;
 
+   hrtimer_init(>timer, CLOCK_MONOTONIC, 
HRTIMER_MODE_ABS_PINNED_HARD);
+   tlat->timer.function = timerlat_irq;
+
migrate_enable();
return 0;
 };
@@ -2526,9 +2529,6 @@ timerlat_fd_read(struct file *file, char __user *ubuf, 
size_t count,
tlat->tracing_thread = false;
tlat->kthread = current;
 
-   hrtimer_init(>timer, CLOCK_MONOTONIC, 
HRTIMER_MODE_ABS_PINNED_HARD);
-   tlat->timer.function = timerlat_irq;
-
/* Annotate now to drift new period */
tlat->abs_period = hrtimer_cb_get_time(>timer);
 
-- 
2.43.0

[GIT PULL] tracing/tools: Updates for 6.7

2023-10-30 Thread Daniel Bristot de Oliveira

Steven,

Tracing tools changes for 6.7:

RTLA:
  - On rtla/utils.c, initialize the 'found' variable to avoid garbage
when a mount point is not found.

Verification:
  - Remove duplicated imports on dot2k python script


Please pull the latest tracing-tools-v6.7 tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git
tracing-tools-v6.7

Tag SHA1: eadb7e76bc7ce4174d4905a29a4a07cd835508fe
Head SHA1: 696444a544ecd6d62c1edc89516b376cefb28929


Alessandro Carminati (Red Hat) (1):
  verification/dot2k: Delete duplicate imports

Colin Ian King (1):
  rtla: Fix uninitialized variable found


 tools/tracing/rtla/src/utils.c | 2 +-
 tools/verification/dot2/dot2k  | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)
---
diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index 623a38908ed5..c769d7b3842c 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -538,7 +538,7 @@ static const int find_mount(const char *fs, char *mp, int 
sizeof_mp)
 {
char mount_point[MAX_PATH];
char type[100];
-   int found;
+   int found = 0;
FILE *fp;
 
fp = fopen("/proc/mounts", "r");
diff --git a/tools/verification/dot2/dot2k b/tools/verification/dot2/dot2k
index 9dcd38abe20a..d4d7e52d549e 100644
--- a/tools/verification/dot2/dot2k
+++ b/tools/verification/dot2/dot2k
@@ -15,8 +15,6 @@ if __name__ == '__main__':
 import os
 import platform
 import sys
-import sys
-import argparse
 
 parser = argparse.ArgumentParser(description='transform .dot file into 
kernel rv monitor')
 parser.add_argument('-d', "--dot", dest="dot_file", required=True)

Re: [PATCH v5] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms

2023-10-16 Thread Daniel Bristot de Oliveira



>> In any case, as you've pointed out, duplicates can arise from names in code
>> that is not intended to be a module.
>> Therefore, relying solely on the module name would not fully address the
>> problem you initially aimed to solve.
> 
> From my POV:
> 
> The source path and the line number is enough to distinguish duplicate
> symbols even in modules.
> 
> The added module name would just add extra complexity into the kernel
> and tools parsing and using the alias. The tracing tools would need to
> handle the source path and line number anyway for symbols duplicated
> within same module/vmlinux.
> 
> Adding module name for builtin modules might be misleading. It won't
> be clear which symbols are in vmlinux binary and which are in
> real modules.

+1

-- Daniel

Re: [PATCH v1] Ftrace: make sched_wakeup can focus on the target process

2023-10-10 Thread Daniel Bristot de Oliveira

On 10/9/23 17:37, Jinyu Tang wrote:
> $ cyclictest --mlockall --smp --priority=99 &

rtla timerlat -a 

will give you an structured analysis of your latency...

https://bristot.me/linux-scheduling-latency-debug-and-analysis/

-- Daniel

Re: [PATCH V3] tracing/timerlat: Hotplug support for the user-space interface

2023-10-04 Thread Daniel Bristot de Oliveira

On 10/4/23 03:03, Steven Rostedt wrote:
> On Fri, 29 Sep 2023 17:02:46 +0200
> Daniel Bristot de Oliveira  wrote:
> 
>> The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible
>> CPU, but it might create confusion if the CPU is not online.
>>
>> Create the file only for online CPUs, also follow hotplug by
>> creating and deleting as CPUs come and go.
>>
>> Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface")
> 
> Is this a fix that needs to go in now and Cc'd to stable? Or is this
> something that can wait till the next merge window?

We can wait for the next merge window... it is a non-trivial fix.

-- Daniel
> -- Steve

[GIT PULL] rtla: Fixes for 6.6

2023-10-04 Thread Daniel Bristot de Oliveira

Steven,

Timerlat auto-analysis:

  - Timerlat is reporting thread interference time without thread noise
events occurrence. It was caused because the thread interference variable
was not reset after the analysis of a timerlat activation that did not
hit the threshold.

  - The IRQ handler delay is estimated from the delta of the IRQ latency
reported by timerlat, and the timestamp from IRQ handler start event.
If the delta is near-zero, the drift from the external clock and the
trace event and/or the overhead can cause the value to be negative.
If the value is negative, print a zero-delay.

  - IRQ handlers happening after the timerlat thread event but before
the stop tracing were being reported as IRQ that happened before the
*current* IRQ occurrence. Ignore Previous IRQ noise in this condition
because they are valid only for the *next* timerlat activation.

Timerlat user-space:

  - Timerlat is stopping all user-space thread if a CPU becomes
offline. Do not stop the entire tool if a CPU is/become offline,
but only the thread of the unavailable CPU. Stop the tool only,
if all threads leave because the CPUs become/are offline.

man-pages:

  - Fix command line example in timerlat hist man page.


Please pull the latest rtla-v6.6-fixes tree, which can be found at:


  git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git
rtla-v6.6-fixes

Tag SHA1: 6a0df51b5c0a075a553a5ad73b1fd421f559eb6b
Head SHA1: 81ec384b80ffbda752c230778d39ea620c7e3bcf


Daniel Bristot de Oliveira (4):
  rtla/timerlat_aa: Zero thread sum after every sample analysis
  rtla/timerlat_aa: Fix negative IRQ delay
  rtla/timerlat_aa: Fix previous IRQ delay for IRQs that happens after 
thread sample
  rtla/timerlat: Do not stop user-space if a cpu is offline

Xie XiuQi (1):
  rtla: fix a example in rtla-timerlat-hist.rst


 Documentation/tools/rtla/rtla-timerlat-hist.rst |  4 ++--
 tools/tracing/rtla/src/timerlat_aa.c| 32 -
 tools/tracing/rtla/src/timerlat_u.c |  6 +++--
 3 files changed, 32 insertions(+), 10 deletions(-)
---
diff --git a/Documentation/tools/rtla/rtla-timerlat-hist.rst 
b/Documentation/tools/rtla/rtla-timerlat-hist.rst
index 057db78d4095..03b7f3deb069 100644
--- a/Documentation/tools/rtla/rtla-timerlat-hist.rst
+++ b/Documentation/tools/rtla/rtla-timerlat-hist.rst
@@ -36,11 +36,11 @@ EXAMPLE
 In the example below, **rtla timerlat hist** is set to run for *10* minutes,
 in the cpus *0-4*, *skipping zero* only lines. Moreover, **rtla timerlat
 hist** will change the priority of the *timerlat* threads to run under
-*SCHED_DEADLINE* priority, with a *10us* runtime every *1ms* period. The
+*SCHED_DEADLINE* priority, with a *100us* runtime every *1ms* period. The
 *1ms* period is also passed to the *timerlat* tracer. Auto-analysis is disabled
 to reduce overhead ::
 
-  [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa
+  [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1000 --no-aa
   # RTLA timerlat histogram
   # Time unit is microseconds (us)
   # Duration:   0 00:10:00
diff --git a/tools/tracing/rtla/src/timerlat_aa.c 
b/tools/tracing/rtla/src/timerlat_aa.c
index e0ffe69c271c..7093fd5333be 100644
--- a/tools/tracing/rtla/src/timerlat_aa.c
+++ b/tools/tracing/rtla/src/timerlat_aa.c
@@ -159,6 +159,7 @@ static int timerlat_aa_irq_latency(struct timerlat_aa_data 
*taa_data,
taa_data->thread_nmi_sum = 0;
taa_data->thread_irq_sum = 0;
taa_data->thread_softirq_sum = 0;
+   taa_data->thread_thread_sum = 0;
taa_data->thread_blocking_duration = 0;
taa_data->timer_irq_start_time = 0;
taa_data->timer_irq_duration = 0;
@@ -337,7 +338,23 @@ static int timerlat_aa_irq_handler(struct trace_seq *s, 
struct tep_record *recor
taa_data->timer_irq_start_time = start;
taa_data->timer_irq_duration = duration;
 
-   taa_data->timer_irq_start_delay = 
taa_data->timer_irq_start_time - expected_start;
+   /*
+* We are dealing with two different clock sources: the
+* external clock source that timerlat uses as a reference
+* and the clock used by the tracer. There are also two
+* moments: the time reading the clock and the timer in
+* which the event is placed in the buffer (the trace
+* event timestamp). If the processor is slow or there
+* is some hardware noise, the difference between the
+* timestamp and the external clock read can be longer
+* than the IRQ handler delay, resulting in a negative
+* time. If so, set IRQ start delay as 0. In the end,
+* it is less relevant than the noise.
+*/
+   if (expected_start

[PATCH V3] tracing/timerlat: Hotplug support for the user-space interface

2023-09-29 Thread Daniel Bristot de Oliveira

The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible
CPU, but it might create confusion if the CPU is not online.

Create the file only for online CPUs, also follow hotplug by
creating and deleting as CPUs come and go.

Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface")
Signed-off-by: Daniel Bristot de Oliveira 
---

Changes from V2:
  - Better split the code into the generic (per_cpu/cpu$)
and timerlat (/timerlat_fd) specific function (Daniel)
  - Fixed a cpus_read_lock/unlock() usage (kbuild test)
  Link: 
https://lore.kernel.org/lkml/6b9a5f306e488bc77bf8521faeade420a0adf3e4.1695224204.git.bris...@kernel.org/

Changes from V1:
  - Fix compilation issue when !HOTPLUG
  - Fix init interface | hotplug race
  Link: 
https://lore.kernel.org/lkml/b619d9fd08a3bb47018cf40afa95783844a3c1fd.1694789910.git.bris...@kernel.org/

 kernel/trace/trace_osnoise.c | 149 ---
 1 file changed, 121 insertions(+), 28 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index bd0d01d00fb9..f375fb1e721d 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -229,6 +229,19 @@ static inline struct osnoise_variables 
*this_cpu_osn_var(void)
 }
 
 #ifdef CONFIG_TIMERLAT_TRACER
+
+/*
+ * osnoise/per_cpu dir
+ */
+static struct dentry   *osnoise_per_cpu_fd;
+
+struct osnoise_per_cpu_dir {
+   struct dentry *root;
+   struct dentry *timerlat_fd;
+};
+
+static DEFINE_PER_CPU(struct osnoise_per_cpu_dir, osnoise_per_cpu_dir);
+
 /*
  * Runtime information for the timer mode.
  */
@@ -2000,6 +2013,9 @@ static int start_kthread(unsigned int cpu)
char comm[24];
 
if (timerlat_enabled()) {
+   if (!test_bit(OSN_WORKLOAD, _options))
+   return 0;
+
snprintf(comm, 24, "timerlat/%d", cpu);
main = timerlat_main;
} else {
@@ -2065,19 +2081,99 @@ static int start_per_cpu_kthreads(void)
return retval;
 }
 
+#ifdef CONFIG_TIMERLAT_TRACER
+static struct dentry *osnoise_get_per_cpu_dir(long cpu)
+{
+   struct dentry *cpu_dir_entry = per_cpu_ptr(_per_cpu_dir, 
cpu)->root;
+   char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */
+
+   if (cpu_dir_entry)
+   return cpu_dir_entry;
+
+   if (!osnoise_per_cpu_fd)
+   return NULL;
+
+   snprintf(cpu_str, 30, "cpu%ld", cpu);
+   cpu_dir_entry = tracefs_create_dir(cpu_str, osnoise_per_cpu_fd);
+   if (!cpu_dir_entry)
+   return NULL;
+
+   per_cpu_ptr(_per_cpu_dir, cpu)->root = cpu_dir_entry;
+
+   return cpu_dir_entry;
+}
+
+static void osnoise_put_per_cpu_dir(long cpu)
+{
+   struct dentry *cpu_dir_entry = per_cpu_ptr(_per_cpu_dir, 
cpu)->root;
+
+   if (!cpu_dir_entry)
+   return;
+
+   tracefs_remove(cpu_dir_entry);
+
+   per_cpu_ptr(_per_cpu_dir, cpu)->root = NULL;
+}
+
+static const struct file_operations timerlat_fd_fops;
+
+static int timerlat_add_per_cpu_interface(long cpu)
+{
+   struct dentry *cpu_dir_entry = osnoise_get_per_cpu_dir(cpu);
+   struct dentry *timerlat_fd;
+
+   lockdep_assert_cpus_held();
+
+   if (!cpu_dir_entry)
+   return -ENOMEM;
+
+   timerlat_fd = trace_create_file("timerlat_fd", TRACE_MODE_READ, 
cpu_dir_entry, NULL,
+   _fd_fops);
+   if (!timerlat_fd)
+   goto out_put;
+
+   per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = timerlat_fd;
+
+   /* Record the CPU */
+   d_inode(timerlat_fd)->i_cdev = (void *)(cpu);
+
+   return 0;
+
+out_put:
+   osnoise_put_per_cpu_dir(cpu);
+   return -ENOMEM;
+}
+
+static void timerlat_rm_per_cpu_interface(long cpu)
+{
+   struct dentry *timerlat_fd = per_cpu_ptr(_per_cpu_dir, 
cpu)->timerlat_fd;
+
+   if (timerlat_fd) {
+   tracefs_remove(timerlat_fd);
+   per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = NULL;
+   }
+
+   osnoise_put_per_cpu_dir(cpu);
+}
+#elif defined(CONFIG_HOTPLUG_CPU)
+static int timerlat_add_per_cpu_interface(long cpu) { return 0; };
+static void timerlat_rm_per_cpu_interface(long cpu) {};
+#endif
+
 #ifdef CONFIG_HOTPLUG_CPU
 static void osnoise_hotplug_workfn(struct work_struct *dummy)
 {
unsigned int cpu = smp_processor_id();
 
mutex_lock(_types_lock);
-
-   if (!osnoise_has_registered_instances())
-   goto out_unlock_trace;
-
mutex_lock(_lock);
cpus_read_lock();
 
+   timerlat_add_per_cpu_interface(cpu);
+
+   if (!osnoise_has_registered_instances())
+   goto out_unlock;
+
if (!cpumask_test_cpu(cpu, _cpumask))
goto out_unlock;
 
@@ -2086,7 +2182,6 @@ static void osnoise_hotplug_workfn(struct work_struct 
*dummy)
 out_unlock:
cpus_read_unlock();
mutex_un

[PATCH V2] tracing/timerlat: Hotplug support for the user-space interface

2023-09-20 Thread Daniel Bristot de Oliveira

The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible
CPU, but it might create confusion if the CPU is not online.

Create the file only for online CPUs, also follow hotplug by
creating and deleting as CPUs come and go.

Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface")
Signed-off-by: Daniel Bristot de Oliveira 
---
  Changes from V1:
- Fix compilation issue when !HOTPLUG
- Fix init interface | hotplug race
Link: 
https://lore.kernel.org/lkml/b619d9fd08a3bb47018cf40afa95783844a3c1fd.1694789910.git.bris...@kernel.org/

 kernel/trace/trace_osnoise.c | 112 +++
 1 file changed, 86 insertions(+), 26 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index bd0d01d00fb9..8422562d9864 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -229,6 +229,19 @@ static inline struct osnoise_variables 
*this_cpu_osn_var(void)
 }
 
 #ifdef CONFIG_TIMERLAT_TRACER
+
+/*
+ * osnoise/per_cpu dir
+ */
+static struct dentry   *osnoise_per_cpu_fd;
+
+struct osnoise_per_cpu_dir {
+   struct dentry *root;
+   struct dentry *timerlat_fd;
+};
+
+static DEFINE_PER_CPU(struct osnoise_per_cpu_dir, osnoise_per_cpu_dir);
+
 /*
  * Runtime information for the timer mode.
  */
@@ -2000,6 +2013,9 @@ static int start_kthread(unsigned int cpu)
char comm[24];
 
if (timerlat_enabled()) {
+   if (!test_bit(OSN_WORKLOAD, _options))
+   return 0;
+
snprintf(comm, 24, "timerlat/%d", cpu);
main = timerlat_main;
} else {
@@ -2065,19 +2081,64 @@ static int start_per_cpu_kthreads(void)
return retval;
 }
 
+#ifdef CONFIG_TIMERLAT_TRACER
+static const struct file_operations timerlat_fd_fops;
+static int timerlat_add_per_cpu_interface(long cpu)
+{
+   struct dentry *timerlat_fd, *cpu_dir_fd;
+   char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */
+
+   if (!osnoise_per_cpu_fd)
+   return 0;
+
+   snprintf(cpu_str, 30, "cpu%ld", cpu);
+   cpu_dir_fd = tracefs_create_dir(cpu_str, osnoise_per_cpu_fd);
+
+   if (cpu_dir_fd) {
+   timerlat_fd = trace_create_file("timerlat_fd", TRACE_MODE_READ,
+  cpu_dir_fd, NULL, 
_fd_fops);
+   WARN_ON_ONCE(!timerlat_fd);
+   per_cpu_ptr(_per_cpu_dir, cpu)->root = cpu_dir_fd;
+   per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = 
timerlat_fd;
+
+   /* Record the CPU */
+   d_inode(timerlat_fd)->i_cdev = (void *)(cpu);
+
+   return 0;
+   }
+
+   return -ENOMEM;
+}
+
+static void timerlat_rm_per_cpu_interface(long cpu)
+{
+   struct dentry *cpu_dir = per_cpu_ptr(_per_cpu_dir, cpu)->root;
+
+   if (cpu_dir) {
+   tracefs_remove(cpu_dir);
+   per_cpu_ptr(_per_cpu_dir, cpu)->root = NULL;
+   per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = NULL;
+   }
+}
+#elif defined(CONFIG_HOTPLUG_CPU)
+static int timerlat_add_per_cpu_interface(long cpu) { return 0; };
+static void timerlat_rm_per_cpu_interface(long cpu) {};
+#endif
+
 #ifdef CONFIG_HOTPLUG_CPU
 static void osnoise_hotplug_workfn(struct work_struct *dummy)
 {
unsigned int cpu = smp_processor_id();
 
mutex_lock(_types_lock);
-
-   if (!osnoise_has_registered_instances())
-   goto out_unlock_trace;
-
mutex_lock(_lock);
cpus_read_lock();
 
+   timerlat_add_per_cpu_interface(cpu);
+
+   if (!osnoise_has_registered_instances())
+   goto out_unlock;
+
if (!cpumask_test_cpu(cpu, _cpumask))
goto out_unlock;
 
@@ -2086,7 +2147,6 @@ static void osnoise_hotplug_workfn(struct work_struct 
*dummy)
 out_unlock:
cpus_read_unlock();
mutex_unlock(_lock);
-out_unlock_trace:
mutex_unlock(_types_lock);
 }
 
@@ -2106,6 +2166,7 @@ static int osnoise_cpu_init(unsigned int cpu)
  */
 static int osnoise_cpu_die(unsigned int cpu)
 {
+   timerlat_rm_per_cpu_interface(cpu);
stop_kthread(cpu);
return 0;
 }
@@ -2708,10 +2769,7 @@ static int init_timerlat_stack_tracefs(struct dentry 
*top_dir)
 
 static int osnoise_create_cpu_timerlat_fd(struct dentry *top_dir)
 {
-   struct dentry *timerlat_fd;
-   struct dentry *per_cpu;
-   struct dentry *cpu_dir;
-   char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */
+   int retval;
long cpu;
 
/*
@@ -2720,29 +2778,24 @@ static int osnoise_create_cpu_timerlat_fd(struct dentry 
*top_dir)
 * Because osnoise/timerlat have a single workload, having
 * multiple files like these are wast of memory.
 */
-   per_cpu = tracefs_create_dir("per_cpu", top_dir);
-   if (!per_cpu)
+   osnoise_per_cpu_fd = tra

Re: [PATCH] tracing/timerlat: Hotplug support for the user-space interface

2023-09-18 Thread Daniel Bristot de Oliveira

On 9/16/23 02:21, kernel test robot wrote:
> Hi Daniel,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v6.6-rc1 next-20230915]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Daniel-Bristot-de-Oliveira/tracing-timerlat-Hotplug-support-for-the-user-space-interface/20230915-230157
> base:   linus/master
> patch link:
> https://lore.kernel.org/r/b619d9fd08a3bb47018cf40afa95783844a3c1fd.1694789910.git.bristot%40kernel.org
> patch subject: [PATCH] tracing/timerlat: Hotplug support for the user-space 
> interface
> config: um-randconfig-002-20230916 
> (https://download.01.org/0day-ci/archive/20230916/202309160854.saw0rium-...@intel.com/config)
> 

^^ The config has no timerlat and no hotplug...

compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): 
> (https://download.01.org/0day-ci/archive/20230916/202309160854.saw0rium-...@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version 
> of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot 
> | Closes: 
> https://lore.kernel.org/oe-kbuild-all/202309160854.saw0rium-...@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
>>> kernel/trace/trace_osnoise.c:2125:13: warning: 
>>> 'timerlat_rm_per_cpu_interface' defined but not used [-Wunused-function]
> 2125 | static void timerlat_rm_per_cpu_interface(long cpu) {};
>  | ^
>>> kernel/trace/trace_osnoise.c:2124:12: warning: 
>>> 'timerlat_add_per_cpu_interface' defined but not used [-Wunused-function]
> 2124 | static int timerlat_add_per_cpu_interface(long cpu) { return 0; };
>  |^~

These functions are called when hotplug but no timerlat when no hotplug: 
defined but not used.

> 
> vim +/timerlat_rm_per_cpu_interface +2125 kernel/trace/trace_osnoise.c
> 
>   2112
>   2113static void timerlat_rm_per_cpu_interface(long cpu)
>   2114{
>   2115struct dentry *cpu_dir = 
> per_cpu_ptr(_per_cpu_dir, cpu)->root;
>   2116
>   2117if (cpu_dir) {
>   2118tracefs_remove(cpu_dir);
>   2119per_cpu_ptr(_per_cpu_dir, cpu)->root = 
> NULL;
>   2120per_cpu_ptr(_per_cpu_dir, 
> cpu)->timerlat_fd = NULL;
>   2121}
>   2122}
>   2123#else
>> 2124 static int timerlat_add_per_cpu_interface(long cpu) { return 0; };
>> 2125 static void timerlat_rm_per_cpu_interface(long cpu) {};
>   2126#endif
>   2127
> 

Fixing it.

-- Daniel

[PATCH] tracing/timerlat: Hotplug support for the user-space interface

2023-09-15 Thread Daniel Bristot de Oliveira

The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible
CPU, but it might create confusion if the CPU is not online.

Create the file only for online CPUs, also follow hotplug by
creating and deleting as CPUs come and go.

Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface")
Signed-off-by: Daniel Bristot de Oliveira 
---
 kernel/trace/trace_osnoise.c | 101 ++-
 1 file changed, 77 insertions(+), 24 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index bd0d01d00fb9..1af01eec3e36 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -229,6 +229,19 @@ static inline struct osnoise_variables 
*this_cpu_osn_var(void)
 }
 
 #ifdef CONFIG_TIMERLAT_TRACER
+
+/*
+ * osnoise/per_cpu dir
+ */
+static struct dentry   *osnoise_per_cpu_fd;
+
+struct osnoise_per_cpu_dir {
+   struct dentry *root;
+   struct dentry *timerlat_fd;
+};
+
+static DEFINE_PER_CPU(struct osnoise_per_cpu_dir, osnoise_per_cpu_dir);
+
 /*
  * Runtime information for the timer mode.
  */
@@ -2000,6 +2013,9 @@ static int start_kthread(unsigned int cpu)
char comm[24];
 
if (timerlat_enabled()) {
+   if (!test_bit(OSN_WORKLOAD, _options))
+   return 0;
+
snprintf(comm, 24, "timerlat/%d", cpu);
main = timerlat_main;
} else {
@@ -2065,19 +2081,64 @@ static int start_per_cpu_kthreads(void)
return retval;
 }
 
+#ifdef CONFIG_TIMERLAT_TRACER
+static const struct file_operations timerlat_fd_fops;
+static int timerlat_add_per_cpu_interface(long cpu)
+{
+   struct dentry *timerlat_fd, *cpu_dir_fd;
+   char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */
+
+   if (!osnoise_per_cpu_fd)
+   return 0;
+
+   snprintf(cpu_str, 30, "cpu%ld", cpu);
+   cpu_dir_fd = tracefs_create_dir(cpu_str, osnoise_per_cpu_fd);
+
+   if (cpu_dir_fd) {
+   timerlat_fd = trace_create_file("timerlat_fd", TRACE_MODE_READ,
+  cpu_dir_fd, NULL, 
_fd_fops);
+   WARN_ON_ONCE(!timerlat_fd);
+   per_cpu_ptr(_per_cpu_dir, cpu)->root = cpu_dir_fd;
+   per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = 
timerlat_fd;
+
+   /* Record the CPU */
+   d_inode(timerlat_fd)->i_cdev = (void *)(cpu);
+
+   return 0;
+   }
+
+   return -ENOMEM;
+}
+
+static void timerlat_rm_per_cpu_interface(long cpu)
+{
+   struct dentry *cpu_dir = per_cpu_ptr(_per_cpu_dir, cpu)->root;
+
+   if (cpu_dir) {
+   tracefs_remove(cpu_dir);
+   per_cpu_ptr(_per_cpu_dir, cpu)->root = NULL;
+   per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = NULL;
+   }
+}
+#else
+static int timerlat_add_per_cpu_interface(long cpu) { return 0; };
+static void timerlat_rm_per_cpu_interface(long cpu) {};
+#endif
+
 #ifdef CONFIG_HOTPLUG_CPU
 static void osnoise_hotplug_workfn(struct work_struct *dummy)
 {
unsigned int cpu = smp_processor_id();
 
mutex_lock(_types_lock);
-
-   if (!osnoise_has_registered_instances())
-   goto out_unlock_trace;
-
mutex_lock(_lock);
cpus_read_lock();
 
+   timerlat_add_per_cpu_interface(cpu);
+
+   if (!osnoise_has_registered_instances())
+   goto out_unlock;
+
if (!cpumask_test_cpu(cpu, _cpumask))
goto out_unlock;
 
@@ -2086,7 +2147,6 @@ static void osnoise_hotplug_workfn(struct work_struct 
*dummy)
 out_unlock:
cpus_read_unlock();
mutex_unlock(_lock);
-out_unlock_trace:
mutex_unlock(_types_lock);
 }
 
@@ -2106,6 +2166,7 @@ static int osnoise_cpu_init(unsigned int cpu)
  */
 static int osnoise_cpu_die(unsigned int cpu)
 {
+   timerlat_rm_per_cpu_interface(cpu);
stop_kthread(cpu);
return 0;
 }
@@ -2708,10 +2769,7 @@ static int init_timerlat_stack_tracefs(struct dentry 
*top_dir)
 
 static int osnoise_create_cpu_timerlat_fd(struct dentry *top_dir)
 {
-   struct dentry *timerlat_fd;
-   struct dentry *per_cpu;
-   struct dentry *cpu_dir;
-   char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */
+   int retval;
long cpu;
 
/*
@@ -2720,29 +2778,24 @@ static int osnoise_create_cpu_timerlat_fd(struct dentry 
*top_dir)
 * Because osnoise/timerlat have a single workload, having
 * multiple files like these are wast of memory.
 */
-   per_cpu = tracefs_create_dir("per_cpu", top_dir);
-   if (!per_cpu)
+   osnoise_per_cpu_fd = tracefs_create_dir("per_cpu", top_dir);
+   if (!osnoise_per_cpu_fd)
return -ENOMEM;
 
-   for_each_possible_cpu(cpu) {
-   snprintf(cpu_str, 30, "cpu%ld", cpu);
-   cpu_dir = tracefs_cre

[PATCH] tools/rtla: Do not stop user-space if a cpu is offline

2023-09-15 Thread Daniel Bristot de Oliveira

If no CPU list is passed, timerlat in user-space will dispatch
one thread per sysconf(_SC_NPROCESSORS_CONF). However, not all
CPU might be available, for instance, if HT is disabled.

Currently, rtla timerlat is stopping the session if an user-space
thread cannot set affinity to a CPU, or if a running user-space
thread is killed. However, this is too restrictive.

So, reduce the error to a debug message, and rtla timerlat run as
long as there is at least one user-space thread alive.

Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support")
Signed-off-by: Daniel Bristot de Oliveira 
---
 tools/tracing/rtla/src/timerlat_u.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/tracing/rtla/src/timerlat_u.c 
b/tools/tracing/rtla/src/timerlat_u.c
index 05e310696dd5..01dbf9a6b5a5 100644
--- a/tools/tracing/rtla/src/timerlat_u.c
+++ b/tools/tracing/rtla/src/timerlat_u.c
@@ -45,7 +45,7 @@ static int timerlat_u_main(int cpu, struct timerlat_u_params 
*params)
 
retval = sched_setaffinity(gettid(), sizeof(set), );
if (retval == -1) {
-   err_msg("Error setting user thread affinity\n");
+   debug_msg("Error setting user thread affinity %d, is the CPU 
online?\n", cpu);
exit(1);
}
 
@@ -193,7 +193,9 @@ void *timerlat_u_dispatcher(void *data)
procs_count--;
}
}
-   break;
+
+   if (!procs_count)
+   break;
}
 
sleep(1);
-- 
2.38.1

Re: [PATCH v2] verification/dot2k: Delete duplicate imports

2023-09-14 Thread Daniel Bristot de Oliveira

On 9/6/23 17:57, Alessandro Carminati (Red Hat) wrote:
> The presence of duplicate import lines appears to be a typo.
> Removing them.
> 
> Fixes: 24bce201d798 ("tools/rv: Add dot2k")
> Signed-off-by: Alessandro Carminati (Red Hat) 

Queued, thanks!

https://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git/commit/?h=tools/verification=5a9587fea055163026b6d22d593fc64ed04de3a6

-- Daniel

Re: [PATCH] kernel:irq:manage: request threaded irq with a specified priority

2021-04-16 Thread Daniel Bristot de Oliveira

On 4/16/21 6:57 AM, chensong wrote:
> 
> 
> On 2021/4/13 下午4:39, Thomas Gleixner wrote:
>> On Tue, Apr 13 2021 at 14:19, Song Chen wrote:
>>> In general, irq handler thread will be assigned a default priority which
>>> is MAX_RT_PRIO/2, as a result, no one can preempt others.
>>>
>>> Here is the case I found in a real project, an interrupt int_a is
>>> coming, wakes up its handler handler_a and handler_a wakes up a
>>> userspace RT process task_a.
>>>
>>> However, if another irq handler handler_b which has nothing to do
>>> with any RT tasks is running when int_a is coming, handler_a can't
>>> preempt handler_b, as a result, task_a can't be waken up immediately
>>> as expected until handler_b gives up cpu voluntarily. In this case,
>>> determinism breaks.
>>
>> It breaks because the system designer failed to assign proper priorities
>> to the irq threads int_a, int_b and to the user space process task_a.
> 
> yes, it's designers' responsibility to assign proper priorities, but
> kernel is also obliged to provide interfaces for those designers.

There is no optimal priority assignment for fixed priority schedulers without a
prior knowledge of all tasks (and their timing behavior, e.g., exec time,
activation pattern...). So, the developer will never know what is the best
priority. Such fine tune should be done by the user.

> 
> chrt can help designers in this case, however, the truth is lot of customers 
> are
> not familiar with it.

And making this change in C in kernel is just turning it even more complex.

what's more, chrt can also apply to userspace rt task, but
> userspace also has sched_setscheduler to assgin proper priority inside code 
> like
> cyclictest, why can't driver writers have another choice?

The developer of task_a can also use sched_setscheduler() to adjust the priority
of the handler_a - or even better, decrease the priority of the handler_b as it
is not that important. The developer is supposed to know how to change priority
because task_a is RT too.

Note that the user sets the priority on cyclictest (-p).

-- Daniel

Re: [RFC PATCH 1/5] tracing/hwlat: Add a cpus file specific for hwlat_detector

2021-04-15 Thread Daniel Bristot de Oliveira

On 4/15/21 3:49 PM, Steven Rostedt wrote:
> OK, for osnoise, I can see how it is useful. But as you said above, for
> hwlat tracer, it's not as useful.

I agree, it is not as useful.

-- Daniel

Re: [RFC PATCH 5/5] tracing: Add the osnoise tracer

2021-04-15 Thread Daniel Bristot de Oliveira

On 4/14/21 7:14 PM, Steven Rostedt wrote:
> On Thu,  8 Apr 2021 16:13:23 +0200
> Daniel Bristot de Oliveira  wrote:
> 
>> In the context of high-performance computing (HPC), the Operating System
>> Noise (osnoise) refers to the interference experienced by an application
>> due to activities inside the operating system. In the context of Linux,
>> NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
>> system. Moreover, hardware-related jobs can also cause noise, for example,
>> via SMIs.
>>
>> hwlat_detector is one of the tools used to identify the most complex
>> source of noise: hardware noise.
>>
>> In a nutshell, the hwlat_detector creates a thread that runs
>> periodically for a given period. At the beginning of a period, the thread
>> disables interrupt and starts sampling. While running, the hwlatd
>> thread reads the time in a loop. As interrupts are disabled, threads,
>> IRQs, and SoftIRQs cannot interfere with the hwlatd thread. Hence, the
>> cause of any gap between two different reads of the time roots either on
>> NMI or in the hardware itself. At the end of the period, hwlatd enables
>> interrupts and reports the max observed gap between the reads. It also
>> prints an NMI occurrence counter. If the output does not report NMI
>> executions, the user can conclude that the hardware is the culprit for
>> the latency. The hwlat detects the NMI execution by observing
>> the entry and exit of an NMI.
>>
>> The osnoise tracer leverages the hwlat_detector by running a
>> similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing
>> all the sources of osnoise during its execution. Using the same approach
>> of hwlat, osnoise takes note of the entry and exit point of any
>> source of interferences, increasing a per-cpu interference counter. The
>> osnoise tracer also saves an interference counter for each source of
>> interference. The interference counter for NMI, IRQs, SoftIRQs, and
>> threads is increased anytime the tool observes these interferences' entry
>> events. When a noise happens without any interference from the operating
>> system level, the hardware noise counter increases, pointing to a
>> hardware-related noise. In this way, osnoise can account for any
>> source of interference. At the end of the period, the osnoise tracer
>> prints the sum of all noise, the max single noise, the percentage of CPU
>> available for the thread, and the counters for the noise sources.
>>
>> Usage
>>
>> Write the ASCII text osnoise into the current_tracer file of the
>> tracing system (generally mounted at /sys/kernel/tracing or
>> /sys/kernel/debug/tracing).
>>
>> For example::
>>
>> [root@f32 ~]# cd /sys/kernel/tracing/
>> [root@f32 tracing]# echo osnoise > current_tracer
>>
>> It is possible to follow the trace by reading the trace trace file::
>>
>> [root@f32 tracing]# cat trace
>> # tracer: osnoise
>> #
>> #_-=> irqs-off
>> #   / _=> need-resched
>> #  | / _---=> hardirq/softirq
>> #  || / _--=> preempt-depth  
>>   MAX
>> #  || /  
>>SINGLE Interference counters:
>> #     RUNTIME  NOISE 
>>   % OF CPU  NOISE+-+
>> #   TASK-PID  CPU#    TIMESTAMPIN US   IN US 
>>  AVAILABLE  IN US HWNMIIRQ   SIRQ THREAD
>> #  | | |     |   | | 
>>||  |  |  |  |  |
>><...>-859 [000] 81.637220: 100190 
>>  99.98100   9 18  0   1007 18  1
>><...>-860 [001] 81.638154: 100656 
>>  99.93440  74 23  0   1006 16  3
>><...>-861 [002] 81.638193: 100   5675 
>>  99.43250 202  6  0   1013 25 21
>><...>-862 [003] 81.638242: 100125 
>>  99.98750  45  1  0   1011 23  0
>><...>-863 [004] 81.638260: 100   1721 
>>  99.82790 168  7  0   1002 49 41
>>

Re: [RFC PATCH 3/5] tracing/hwlat: Implement the per-cpu mode

2021-04-15 Thread Daniel Bristot de Oliveira

On 4/14/21 4:41 PM, Steven Rostedt wrote:
> On Thu,  8 Apr 2021 16:13:21 +0200
> Daniel Bristot de Oliveira  wrote:
> 
>> Implements the per-cpu mode in which a sampling thread is created for
>> each cpu in the "cpus" (and tracing_mask).
>>
>> The per-cpu mode has the potention to speed up the hwlat detection by
>> running on multiple CPUs at the same time.
> 
> And totally slow down the entire system in the process ;-)

Too :-) But this is not the default config... So it should be an intentional 
change.

>>
>> Cc: Jonathan Corbet 
>> Cc: Steven Rostedt 
>> Cc: Ingo Molnar 
>> Cc: Peter Zijlstra 
>> Cc: Thomas Gleixner 
>> Cc: Alexandre Chartre 
>> Cc: Clark Willaims 
>> Cc: John Kacur 
>> Cc: Juri Lelli 
>> Cc: linux-...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Daniel Bristot de Oliveira 
>>
>> ---
>>  Documentation/trace/hwlat_detector.rst |   6 +-
>>  kernel/trace/trace_hwlat.c | 171 +++--
>>  2 files changed, 137 insertions(+), 40 deletions(-)
>>
>> diff --git a/Documentation/trace/hwlat_detector.rst 
>> b/Documentation/trace/hwlat_detector.rst
>> index f63fdd867598..7a6fab105b29 100644
>> --- a/Documentation/trace/hwlat_detector.rst
>> +++ b/Documentation/trace/hwlat_detector.rst
>> @@ -85,10 +85,12 @@ the available options are:
>>  
>>   - none:do not force migration
>>   - round-robin: migrate across each CPU specified in cpus between each 
>> window
>> + - per-cpu: create a per-cpu thread for each cpu in cpus
>>  
>>  By default, hwlat detector will also obey the tracing_cpumask, so the thread
>>  will be placed only in the set of cpus that is both on the hwlat detector's
>>  cpus and in the global tracing_cpumask file. The user can overwrite the
>>  cpumask by setting it manually. Changing the hwlatd affinity externally,
>> -e.g., via taskset tool, will disable the round-robin migration.
>> -
>> +e.g., via taskset tool, will disable the round-robin migration. In the
>> +per-cpu mode, the per-cpu thread (hwlatd/CPU) will be pinned to its relative
>> +cpu, and its affinity cannot be changed.
>> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
>> index 3818200c9e24..52968ea312df 100644
>> --- a/kernel/trace/trace_hwlat.c
>> +++ b/kernel/trace/trace_hwlat.c
>> @@ -34,7 +34,7 @@
>>   * Copyright (C) 2008-2009 Jon Masters, Red Hat, Inc. 
>>   * Copyright (C) 2013-2016 Steven Rostedt, Red Hat, Inc. 
>> 
>>   *
>> - * Includes useful feedback from Clark Williams 
>> + * Includes useful feedback from Clark Williams 
> 
> Interesting update ;-)

Should I make it a separated patch? :-)

>>   *
>>   */
>>  #include 
>> @@ -54,9 +54,6 @@ static struct trace_array  *hwlat_trace;
>>  #define DEFAULT_SAMPLE_WIDTH50  /* 0.5s */
>>  #define DEFAULT_LAT_THRESHOLD   10  /* 10us */
>>  
>> -/* sampling thread*/
>> -static struct task_struct *hwlat_kthread;
>> -
>>  static struct dentry *hwlat_sample_width;   /* sample width us */
>>  static struct dentry *hwlat_sample_window;  /* sample window us */
>>  static struct dentry *hwlat_cpumask_dentry; /* hwlat cpus allowed */
>> @@ -65,19 +62,27 @@ static struct dentry *hwlat_thread_mode; /* hwlat thread 
>> mode */
>>  enum {
>>  MODE_NONE = 0,
>>  MODE_ROUND_ROBIN,
>> +MODE_PER_CPU,
>>  MODE_MAX
>>  };
>>  
>> -static char *thread_mode_str[] = { "none", "round-robin" };
>> +static char *thread_mode_str[] = { "none", "round-robin", "per-cpu" };
>>  
>>  /* Save the previous tracing_thresh value */
>>  static unsigned long save_tracing_thresh;
>>  
>> -/* NMI timestamp counters */
>> -static u64 nmi_ts_start;
>> -static u64 nmi_total_ts;
>> -static int nmi_count;
>> -static int nmi_cpu;
>> +/* runtime kthread data */
>> +struct hwlat_kthread_data {
>> +struct task_struct *kthread;
>> +/* NMI timestamp counters */
>> +u64 nmi_ts_start;
>> +u64 nmi_total_ts;
>> +int nmi_count;
>> +int nmi_cpu;
>> +};
>> +
>> +struct hwlat_kthread_data hwlat_single_cpu_data;
>> +DEFINE_PER_CPU(struct hwlat_kthread_data, hwlat_per_cpu_data);
>>  
>>  /* Tells NMIs to call back to the hwlat tracer to record timestamps */
>>  bool trace_hwlat_callback_enabled;
>> @@ -114,6 +119,14 @@ static struct hwlat_data {
&g

Re: [RFC PATCH 2/5] tracing/hwlat: Implement the mode config option

2021-04-15 Thread Daniel Bristot de Oliveira

On 4/14/21 4:30 PM, Steven Rostedt wrote:
> On Thu,  8 Apr 2021 16:13:20 +0200
> Daniel Bristot de Oliveira  wrote:
> 
>> +/**
>> + * hwlat_mode_write - Write function for "mode" entry
>> + * @filp: The active open file structure
>> + * @ubuf: The user buffer that contains the value to write
>> + * @cnt: The maximum number of bytes to write to "file"
>> + * @ppos: The current position in @file
>> + *
>> + * This function provides a write implementation for the "mode" interface
>> + * to the hardware latency detector. hwlatd has different operation modes.
>> + * The "none" sets the allowed cpumask for a single hwlatd thread at the
>> + * startup and lets the scheduler handle the migration. The default mode is
>> + * the "round-robin" one, in which a single hwlatd thread runs, migrating
>> + * among the allowed CPUs in a round-robin fashion.
>> + */
>> +static ssize_t hwlat_mode_write(struct file *filp, const char __user *ubuf,
>> + size_t cnt, loff_t *ppos)
>> +{
>> +const char *mode;
>> +char buf[64];
>> +int ret;
>> +int i;
>> +
>> +if (hwlat_busy)
>> +return -EBUSY;
> 
> So we can't switch modes while running?
> 

As you mentioned in the patch 3/5, this limitation was added because
of the running threads. But, yes, stopping and starting the tracer to
re-create the threads should work as well. I will try it for the next round.

> Also, with this implemented, you can remove the disable_migrate variable,
> and just switch the mode to NONE when it's detected that the affinity mask
> of the thread has been changed.
> 

That was my initial intention with the NONE mode, but I feared breaking
something by removing the "migrate_disable" logic. If you do not think it is
a problem, I will remove the migrate disable and just change the mode.

-- Daniel
> -- Steve
> 
> 
>> +
>> +if (cnt >= sizeof(buf))
>> +return -EINVAL;
>> +
>> +if (copy_from_user(buf, ubuf, cnt))
>> +return -EFAULT;
>> +
>> +buf[cnt] = 0;
>> +
>> +mode = strstrip(buf);
>> +
>> +ret = -EINVAL;
>> +
>> +for (i = 0; i < MODE_MAX; i++) {
>> +if (strcmp(mode, thread_mode_str[i]) == 0) {
>> +hwlat_data.thread_mode = i;
>> +ret = cnt;
>> +}
>> +}
>> +
>> +*ppos += cnt;
>> +
>> +return cnt;
>> +}
>> +
>> +
>

Re: [RFC PATCH 1/5] tracing/hwlat: Add a cpus file specific for hwlat_detector

2021-04-15 Thread Daniel Bristot de Oliveira

On 4/14/21 4:10 PM, Steven Rostedt wrote:
> On Thu,  8 Apr 2021 16:13:19 +0200
> Daniel Bristot de Oliveira  wrote:
> 
>> Provides a "cpus" interface to the hardware latency detector. By
>> default, it lists all CPUs, allowing hwlatd threads to run on any online
>> CPU of the system.
>>
>> It serves to restrict the execution of hwlatd to the set of CPUs writing
>> via this interface. Note that hwlatd also respects the "tracing_cpumask."
>> Hence, hwlatd threads will run only on the set of CPUs allowed here AND
>> on "tracing_cpumask."
>>
>> Why not keep just "tracing_cpumask"? Because the user might be interested
>> in tracing what is running on other CPUs. For instance, one might run
>> hwlatd in one HT CPU while observing what is running on the sibling HT
>> CPU. The cpu list format is also more intuitive.
>>
>> Also in preparation to the per-cpu mode.
> 
> OK, I'm still not convinced that you couldn't use tracing_cpumask here.
> Because we have instances, and tracing_cpumask is defined per instance, you
> could simply do:
> 
>  # cd /sys/kernel/tracing
>  # mkdir instances/hwlat
>  # echo a > instances/hwlat/tracing_cpumask
>  # echo hwlat > instances/hwlat/current_tracer
> 
> Now the tracing_cpumask above only affects the hwlat tracer.
> 
> I'm just reluctant to add more tracing files if the current ones can be
> used without too much trouble. For being intuitive, let's make user space
> tools hide the nastiness of the kernel interface ;-)

[discussing about the cpus file in both hwlat and osnoise here...]

I see your point, but by having two different instances give you two
different output "trace" files... and it is not that always practical to
merge them when using only the tracefs interface (I like to use it, and
it is very handy when dealing with immutable systems, on customers...).

Thinking aloud, one might say: sort the two trace files by timestamp...

and other might reply: but some lines do not have a timestamp associated,
e.g., the stacktrace.

Anyway, the cpus file on hwlat is not a super essential thing, I agree...
interrupts are disabled, so not much could go wrong (although I really
needed the trace from a sibling cpu in a real case).

But for the osnoise tracer the cpus file is really useful. For instance, on a 
system with the CPU 7 isolated:

- %< -
 # echo 7 > osnoise/cpus
 # echo target_cpu == 7 > events/sched/sched_wakeup/filter 
 # echo stacktrace if target_cpu == 7 > events/sched/sched_wakeup/trigger
 # echo 1 > events/sched/sched_wakeup/enable
 # echo osnoise:thread_noise > set_event 
 # echo osnoise > current_tracer
 # cat trace 
[find...]
 kworker/0:1-7   [000] d..5  1820.717780: 
 => trace_event_raw_event_sched_wakeup_template
 => __traceiter_sched_wakeup
 => ttwu_do_wakeup
 => try_to_wake_up
 => __queue_work
 => queue_delayed_work_on
 => vmstat_shepherd
 => process_one_work
 => worker_thread
 => kthread
 => ret_from_fork
 kworker/7:1-410 [007] d..3  1820.717790: thread_noise: kworker/7:1:410 
start 1820.717786519 duration 3626 ns
   osnoise/7-1000[007]   1821.582340: 100 90  99.99100  
15  1  0 12  6  1
- >% -

It was possible to easily find that the '1' thread noise was a kworker,
dispatched from CPU 0, and that it was dispatched by "vmstat_shepherd".

Also, the osnoise dir is not added to a new instance... so, it only
costs "one" file...

> -- Steve
>

Re: [RFC PATCH 5/5] tracing: Add the osnoise tracer

2021-04-09 Thread Daniel Bristot de Oliveira

On 4/8/21 5:58 PM, Jonathan Corbet wrote:
> Daniel Bristot de Oliveira  writes:
> 
> A quick nit:
> 
>>  Documentation/trace/osnoise_tracer.rst |  149 ++
>>  include/linux/ftrace_irq.h |   16 +
>>  include/trace/events/osnoise.h |  141 ++
>>  kernel/trace/Kconfig   |   34 +
>>  kernel/trace/Makefile  |1 +
>>  kernel/trace/trace.h   |9 +-
>>  kernel/trace/trace_entries.h   |   27 +
>>  kernel/trace/trace_osnoise.c   | 1714 
>>  kernel/trace/trace_output.c|   72 +-
>>  9 files changed, 2159 insertions(+), 4 deletions(-)
>>  create mode 100644 Documentation/trace/osnoise_tracer.rst
>>  create mode 100644 include/trace/events/osnoise.h
>>  create mode 100644 kernel/trace/trace_osnoise.c
> When you create a new RST file, you need to add it to an index.rst (or
> similar) file so that it gets incorporated into the docs build.


ack!

> 
> The document itself looks good on a quick read.  If you're making
> another pass over it, you might consider reducing the ``markup noise`` a
> bit; we try to keep that to a minimum in the kernel docs.  But otherwise
> thanks for writing it!

Thanks for the review, Jon. I will reduce the `` markup (on this, and on some
other docs that are about to come :-))

-- Daniel
> jon
>

[RFC PATCH 5/5] tracing: Add the osnoise tracer

2021-04-08 Thread Daniel Bristot de Oliveira

tracer during the associated runtime.
 - The % OF CPU AVAILABLE reports the percentage of CPU available for
   the osnoise thread during the runtime window.
 - The MAX SINGLE NOISE IN US reports the maximum single noise observed
   during the runtime window.
 - The Interference counters display how many each of the respective
   interference happened during the runtime window.

Note that the example above shows a high number of HW noise samples.
The reason being is that this sample was taken on a virtual machine,
and the host interference is detected as a hardware interference.

Tracer options

The tracer has a set of options inside the osnoise directory, they are:

 - cpus: CPUs at which a osnoise thread will execute.
 - period_us: the period of the osnoise thread.
 - runtime_us: how long an osnoise thread will look for noise.
 - stop_tracing_single_us: stop the system tracing of a single noise
   higher than the configured value is happens. Writing 0 disables this
   option.
 - stop_tracing_total_us: stop the system tracing of a NOISE IN USE
   higher than the configured value is happens. Writing 0 disables this
   option.
 - tolerance_ns: the minimum delta between two time() reads to be
   considered as noise.

Additional Tracing

In addition to the tracer, a set of tracepoints were added to
facilitate the identification of the osnoise source.

 - osnoise:sample_threshold: printed anytime a noise is higher than
   the configurable tolerance_ns.
 - osnoise:nmi_noise: noise from NMI, including the duration.
 - osnoise:irq_noise: noise from an IRQ, including the duration.
 - osnoise:softirq_noise: noise from a SoftIRQ, including the
   duration.
 - osnoise:thread_noise: noise from a thread, including the duration.

Note that a all the values are net values. This means that a thread
duration will not contain the duration of the IRQs that happened during
its execution, for example. The same is valid for all duration values.

Here is one example of the usage of these tracepoints::

   osnoise/8-961 [008] d.h.  5789.857532: irq_noise: local_timer:236 
start 5789.857529929 duration 1845 ns
   osnoise/8-961 [008] dNh.  5789.858408: irq_noise: local_timer:236 
start 5789.858404871 duration 2848 ns
 migration/8-54  [008] d...  5789.858413: thread_noise: migration/8:54 
start 5789.858409300 duration 3068 ns
   osnoise/8-961 [008]   5789.858413: sample_threshold: start 
5789.858404555 duration 8 us interferences 2

In this example, a noise sample of 8 microseconds was reported in the last
fine, pointing to two interferences. Looking backward in the trace, the
two previous entries were about the migration thread running after
a timer IRQ execution. The first event is not part of the noise because
it took place one millisecond before.

It is worth noticing that the sum of the duration reported in the
tracepoints is smaller than eight us reported in the
sample_threshold. The reason roots in the tracing overhead and in
the overhead of the entry and exit code that happens before and after
any interference execution. This justifies the dual approach: measuring
thread and tracing.

Cc: Jonathan Corbet 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Alexandre Chartre 
Cc: Clark Willaims 
Cc: John Kacur 
Cc: Juri Lelli 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira 

---
 Documentation/trace/osnoise_tracer.rst |  149 ++
 include/linux/ftrace_irq.h |   16 +
 include/trace/events/osnoise.h |  141 ++
 kernel/trace/Kconfig   |   34 +
 kernel/trace/Makefile  |1 +
 kernel/trace/trace.h   |9 +-
 kernel/trace/trace_entries.h   |   27 +
 kernel/trace/trace_osnoise.c   | 1714 
 kernel/trace/trace_output.c|   72 +-
 9 files changed, 2159 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/trace/osnoise_tracer.rst
 create mode 100644 include/trace/events/osnoise.h
 create mode 100644 kernel/trace/trace_osnoise.c

diff --git a/Documentation/trace/osnoise_tracer.rst 
b/Documentation/trace/osnoise_tracer.rst
new file mode 100644
index ..9a97f557317b
--- /dev/null
+++ b/Documentation/trace/osnoise_tracer.rst
@@ -0,0 +1,149 @@
+==
+OSNOISE Tracer
+==
+
+In the context of high-performance computing (HPC), the Operating System
+Noise (*osnoise*) refers to the interference experienced by an application
+due to activities inside the operating system. In the context of Linux,
+NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
+system. Moreover, hardware-related jobs can also cause noise, for example,
+via SMIs.
+
+``hwlat_detector`` is one of the tools used to identify the most complex
+source of noise: *hardware noise*.
+
+In a nutshell, the ``hwlat_detector`` creates a thread that runs
+periodically for a given period. At the b

[RFC PATCH 4/5] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers

2021-04-08 Thread Daniel Bristot de Oliveira

From: Steven Rostedt 

To have nanosecond output displayed in a more human readable format, its
nicer to convert it to a seconds format (XXX.Y). The problem is that
to do so, the numbers must be divided by NSEC_PER_SEC, and moded too. But as
these numbers are 64 bit, this can not be done simply with '/' and '%'
operators, but must use do_div() instead.

Instead of performing the expensive do_div() in the hot path of the
tracepoint, it is more efficient to perform it during the output phase. But
passing in do_div() can confuse the parser, and do_div() doesn't work
exactly like a normal C function. It modifies the number in place, and we
don't want to modify the actual values in the ring buffer.

Two helper functions are now created:

  __print_ns_to_secs() and __print_ns_without_secs()

They both take a value of nanoseconds, and the former will return that
number divided by NSEC_PER_SEC, and the latter will mod it with NSEC_PER_SEC
giving a way to print a nice human readable format:

 __print_fmt("time=%llu.%09u",
__print_ns_to_secs(REC->nsec_val),
__print_ns_without_secs(REC->nsec_val))

Cc: Jonathan Corbet 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Alexandre Chartre 
Cc: Clark Willaims 
Cc: John Kacur 
Cc: Juri Lelli 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt 
Signed-off-by: Daniel Bristot de Oliveira 

---
 include/trace/trace_events.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 8268bf747d6f..c60fd1037b91 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -33,6 +33,21 @@
static const char TRACE_SYSTEM_STRING[] =   \
__stringify(TRACE_SYSTEM)
 
+#undef __print_ns_to_secs
+#define __print_ns_to_secs(value)  \
+   ({  \
+   u64 val = (u64)value;   \
+   do_div(val, NSEC_PER_SEC);  \
+   val;\
+   })
+
+#undef __print_ns_without_secs
+#define __print_ns_without_secs(value) \
+   ({  \
+   u64 val = (u64)value;   \
+   (u32) do_div(val, NSEC_PER_SEC);\
+   })
+
 TRACE_MAKE_SYSTEM_STR();
 
 #undef TRACE_DEFINE_ENUM
@@ -736,6 +751,16 @@ static inline void ftrace_test_probe_##call(void)  
\
 #undef __print_array
 #undef __print_hex_dump
 
+/*
+ * The below is not executed in the kernel. It is only what is
+ * displayed in the print format for userspace to parse.
+ */
+#undef __print_ns_to_secs
+#define __print_ns_to_secs(val) val / 10UL
+
+#undef __print_ns_without_secs
+#define __print_ns_without_secs(val) val % 10UL
+
 #undef TP_printk
 #define TP_printk(fmt, args...) "\"" fmt "\", "  __stringify(args)
 
-- 
2.30.2

[RFC PATCH 3/5] tracing/hwlat: Implement the per-cpu mode

2021-04-08 Thread Daniel Bristot de Oliveira

Implements the per-cpu mode in which a sampling thread is created for
each cpu in the "cpus" (and tracing_mask).

The per-cpu mode has the potention to speed up the hwlat detection by
running on multiple CPUs at the same time.

Cc: Jonathan Corbet 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Alexandre Chartre 
Cc: Clark Willaims 
Cc: John Kacur 
Cc: Juri Lelli 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira 

---
 Documentation/trace/hwlat_detector.rst |   6 +-
 kernel/trace/trace_hwlat.c | 171 +++--
 2 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/Documentation/trace/hwlat_detector.rst 
b/Documentation/trace/hwlat_detector.rst
index f63fdd867598..7a6fab105b29 100644
--- a/Documentation/trace/hwlat_detector.rst
+++ b/Documentation/trace/hwlat_detector.rst
@@ -85,10 +85,12 @@ the available options are:
 
  - none:do not force migration
  - round-robin: migrate across each CPU specified in cpus between each window
+ - per-cpu: create a per-cpu thread for each cpu in cpus
 
 By default, hwlat detector will also obey the tracing_cpumask, so the thread
 will be placed only in the set of cpus that is both on the hwlat detector's
 cpus and in the global tracing_cpumask file. The user can overwrite the
 cpumask by setting it manually. Changing the hwlatd affinity externally,
-e.g., via taskset tool, will disable the round-robin migration.
-
+e.g., via taskset tool, will disable the round-robin migration. In the
+per-cpu mode, the per-cpu thread (hwlatd/CPU) will be pinned to its relative
+cpu, and its affinity cannot be changed.
diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index 3818200c9e24..52968ea312df 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -34,7 +34,7 @@
  * Copyright (C) 2008-2009 Jon Masters, Red Hat, Inc. 
  * Copyright (C) 2013-2016 Steven Rostedt, Red Hat, Inc. 
  *
- * Includes useful feedback from Clark Williams 
+ * Includes useful feedback from Clark Williams 
  *
  */
 #include 
@@ -54,9 +54,6 @@ static struct trace_array *hwlat_trace;
 #define DEFAULT_SAMPLE_WIDTH   50  /* 0.5s */
 #define DEFAULT_LAT_THRESHOLD  10  /* 10us */
 
-/* sampling thread*/
-static struct task_struct *hwlat_kthread;
-
 static struct dentry *hwlat_sample_width;  /* sample width us */
 static struct dentry *hwlat_sample_window; /* sample window us */
 static struct dentry *hwlat_cpumask_dentry;/* hwlat cpus allowed */
@@ -65,19 +62,27 @@ static struct dentry *hwlat_thread_mode;/* hwlat thread 
mode */
 enum {
MODE_NONE = 0,
MODE_ROUND_ROBIN,
+   MODE_PER_CPU,
MODE_MAX
 };
 
-static char *thread_mode_str[] = { "none", "round-robin" };
+static char *thread_mode_str[] = { "none", "round-robin", "per-cpu" };
 
 /* Save the previous tracing_thresh value */
 static unsigned long save_tracing_thresh;
 
-/* NMI timestamp counters */
-static u64 nmi_ts_start;
-static u64 nmi_total_ts;
-static int nmi_count;
-static int nmi_cpu;
+/* runtime kthread data */
+struct hwlat_kthread_data {
+   struct task_struct *kthread;
+   /* NMI timestamp counters */
+   u64 nmi_ts_start;
+   u64 nmi_total_ts;
+   int nmi_count;
+   int nmi_cpu;
+};
+
+struct hwlat_kthread_data hwlat_single_cpu_data;
+DEFINE_PER_CPU(struct hwlat_kthread_data, hwlat_per_cpu_data);
 
 /* Tells NMIs to call back to the hwlat tracer to record timestamps */
 bool trace_hwlat_callback_enabled;
@@ -114,6 +119,14 @@ static struct hwlat_data {
.thread_mode= MODE_ROUND_ROBIN
 };
 
+struct hwlat_kthread_data *get_cpu_data(void)
+{
+   if (hwlat_data.thread_mode == MODE_PER_CPU)
+   return this_cpu_ptr(_per_cpu_data);
+   else
+   return _single_cpu_data;
+}
+
 static bool hwlat_busy;
 
 static void trace_hwlat_sample(struct hwlat_sample *sample)
@@ -151,7 +164,9 @@ static void trace_hwlat_sample(struct hwlat_sample *sample)
 
 void trace_hwlat_callback(bool enter)
 {
-   if (smp_processor_id() != nmi_cpu)
+   struct hwlat_kthread_data *kdata = get_cpu_data();
+
+   if (kdata->kthread)
return;
 
/*
@@ -160,13 +175,13 @@ void trace_hwlat_callback(bool enter)
 */
if (!IS_ENABLED(CONFIG_GENERIC_SCHED_CLOCK)) {
if (enter)
-   nmi_ts_start = time_get();
+   kdata->nmi_ts_start = time_get();
else
-   nmi_total_ts += time_get() - nmi_ts_start;
+   kdata->nmi_total_ts += time_get() - kdata->nmi_ts_start;
}
 
if (enter)
-   nmi_count++;
+   kdata->nmi_count++;
 }
 
 /**
@@ -178,6 +193,7 @@ void trace_hwlat_callback(bool enter)

[RFC PATCH 2/5] tracing/hwlat: Implement the mode config option

2021-04-08 Thread Daniel Bristot de Oliveira

Provides the "mode" config to the hardware latency detector. hwlatd has
two different operation modes. The default mode is the "round-robin" one,
in which a single hwlatd thread runs, migrating among the allowed CPUs in a
"round-robin" fashion. This is the current behavior.

The "none" sets the allowed cpumask for a single hwlatd thread at the
startup, but skips the round-robin, letting the scheduler handle the
migration.

In preparation to the per-cpu mode.

Cc: Jonathan Corbet 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Alexandre Chartre 
Cc: Clark Willaims 
Cc: John Kacur 
Cc: Juri Lelli 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira 

---
 Documentation/trace/hwlat_detector.rst |  21 +++-
 kernel/trace/trace_hwlat.c | 157 +++--
 2 files changed, 162 insertions(+), 16 deletions(-)

diff --git a/Documentation/trace/hwlat_detector.rst 
b/Documentation/trace/hwlat_detector.rst
index 86f973a7763c..f63fdd867598 100644
--- a/Documentation/trace/hwlat_detector.rst
+++ b/Documentation/trace/hwlat_detector.rst
@@ -76,10 +76,19 @@ in /sys/kernel/tracing:
  - hwlat_detector/width - specified amount of time to spin within window 
(usecs)
  - hwlat_detector/window- amount of time between (width) runs (usecs)
  - hwlat_detector/cpus  - the CPUs to move the hwlat thread across
+ - hwlat_detector/mode - the thread mode
+
+By default, the hwlat detector's kernel thread will migrate across each CPU
+specified in cpumask at the beginning of a new window, in a round-robin
+fashion. This behavior can be changed by changing the thread mode,
+the available options are:
+
+ - none:do not force migration
+ - round-robin: migrate across each CPU specified in cpus between each window
+
+By default, hwlat detector will also obey the tracing_cpumask, so the thread
+will be placed only in the set of cpus that is both on the hwlat detector's
+cpus and in the global tracing_cpumask file. The user can overwrite the
+cpumask by setting it manually. Changing the hwlatd affinity externally,
+e.g., via taskset tool, will disable the round-robin migration.
 
-The hwlat detector's kernel thread will migrate across each CPU specified in
-cpus list between each window. The hwlat detector will also obey the
-tracing_cpumask, so the thread will migrate on the set of cpus that is
-both on its cpus list and in the global tracing_cpumask file.
-To limit the migration, either modify cpumask, or modify the hwlat kernel
-thread (named [hwlatd]) CPU affinity directly, and the migration will stop.
diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index deecb93f97f2..3818200c9e24 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -60,6 +60,15 @@ static struct task_struct *hwlat_kthread;
 static struct dentry *hwlat_sample_width;  /* sample width us */
 static struct dentry *hwlat_sample_window; /* sample window us */
 static struct dentry *hwlat_cpumask_dentry;/* hwlat cpus allowed */
+static struct dentry *hwlat_thread_mode;   /* hwlat thread mode */
+
+enum {
+   MODE_NONE = 0,
+   MODE_ROUND_ROBIN,
+   MODE_MAX
+};
+
+static char *thread_mode_str[] = { "none", "round-robin" };
 
 /* Save the previous tracing_thresh value */
 static unsigned long save_tracing_thresh;
@@ -97,11 +106,16 @@ static struct hwlat_data {
u64 sample_window;  /* total sampling window (on+off) */
u64 sample_width;   /* active sampling portion of window */
 
+   int thread_mode;/* thread mode */
+
 } hwlat_data = {
.sample_window  = DEFAULT_SAMPLE_WINDOW,
.sample_width   = DEFAULT_SAMPLE_WIDTH,
+   .thread_mode= MODE_ROUND_ROBIN
 };
 
+static bool hwlat_busy;
+
 static void trace_hwlat_sample(struct hwlat_sample *sample)
 {
struct trace_array *tr = hwlat_trace;
@@ -337,7 +351,8 @@ static int kthread_fn(void *data)
 
while (!kthread_should_stop()) {
 
-   move_to_next_cpu();
+   if (hwlat_data.thread_mode == MODE_ROUND_ROBIN)
+   move_to_next_cpu();
 
local_irq_disable();
get_sample();
@@ -375,6 +390,14 @@ static int start_kthread(struct trace_array *tr)
if (hwlat_kthread)
return 0;
 
+
+   kthread = kthread_create(kthread_fn, NULL, "hwlatd");
+   if (IS_ERR(kthread)) {
+   pr_err(BANNER "could not start sampling thread\n");
+   return -ENOMEM;
+   }
+
+
/* Just pick the first CPU on first iteration */
get_online_cpus();
/*
@@ -386,16 +409,14 @@ static int start_kthread(struct trace_array *tr)
 */
cpumask_and(current_mask, cpu_online_mask, current_mask);
put_online_cpus();
-

[RFC PATCH 1/5] tracing/hwlat: Add a cpus file specific for hwlat_detector

2021-04-08 Thread Daniel Bristot de Oliveira

Provides a "cpus" interface to the hardware latency detector. By
default, it lists all CPUs, allowing hwlatd threads to run on any online
CPU of the system.

It serves to restrict the execution of hwlatd to the set of CPUs writing
via this interface. Note that hwlatd also respects the "tracing_cpumask."
Hence, hwlatd threads will run only on the set of CPUs allowed here AND
on "tracing_cpumask."

Why not keep just "tracing_cpumask"? Because the user might be interested
in tracing what is running on other CPUs. For instance, one might run
hwlatd in one HT CPU while observing what is running on the sibling HT
CPU. The cpu list format is also more intuitive.

Also in preparation to the per-cpu mode.

Cc: Jonathan Corbet 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Alexandre Chartre 
Cc: Clark Willaims 
Cc: John Kacur 
Cc: Juri Lelli 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira 
---
 Documentation/trace/hwlat_detector.rst |  14 +--
 kernel/trace/trace_hwlat.c | 125 -
 2 files changed, 131 insertions(+), 8 deletions(-)

diff --git a/Documentation/trace/hwlat_detector.rst 
b/Documentation/trace/hwlat_detector.rst
index 5739349649c8..86f973a7763c 100644
--- a/Documentation/trace/hwlat_detector.rst
+++ b/Documentation/trace/hwlat_detector.rst
@@ -73,11 +73,13 @@ in /sys/kernel/tracing:
 
  - tracing_threshold   - minimum latency value to be considered (usecs)
  - tracing_max_latency - maximum hardware latency actually observed (usecs)
- - tracing_cpumask - the CPUs to move the hwlat thread across
- - hwlat_detector/width- specified amount of time to spin within 
window (usecs)
- - hwlat_detector/window   - amount of time between (width) runs (usecs)
+ - hwlat_detector/width - specified amount of time to spin within window 
(usecs)
+ - hwlat_detector/window- amount of time between (width) runs (usecs)
+ - hwlat_detector/cpus  - the CPUs to move the hwlat thread across
 
 The hwlat detector's kernel thread will migrate across each CPU specified in
-tracing_cpumask between each window. To limit the migration, either modify
-tracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU
-affinity directly, and the migration will stop.
+cpus list between each window. The hwlat detector will also obey the
+tracing_cpumask, so the thread will migrate on the set of cpus that is
+both on its cpus list and in the global tracing_cpumask file.
+To limit the migration, either modify cpumask, or modify the hwlat kernel
+thread (named [hwlatd]) CPU affinity directly, and the migration will stop.
diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index 34dc1a712dcb..deecb93f97f2 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -59,6 +59,7 @@ static struct task_struct *hwlat_kthread;
 
 static struct dentry *hwlat_sample_width;  /* sample width us */
 static struct dentry *hwlat_sample_window; /* sample window us */
+static struct dentry *hwlat_cpumask_dentry;/* hwlat cpus allowed */
 
 /* Save the previous tracing_thresh value */
 static unsigned long save_tracing_thresh;
@@ -272,6 +273,7 @@ static int get_sample(void)
return ret;
 }
 
+static struct cpumask hwlat_cpumask;
 static struct cpumask save_cpumask;
 static bool disable_migrate;
 
@@ -292,7 +294,14 @@ static void move_to_next_cpu(void)
goto disable;
 
get_online_cpus();
-   cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask);
+   /*
+* Run only on CPUs in which trace and hwlat are allowed to run.
+*/
+   cpumask_and(current_mask, tr->tracing_cpumask, _cpumask);
+   /*
+* And the CPU is online.
+*/
+   cpumask_and(current_mask, cpu_online_mask, current_mask);
next_cpu = cpumask_next(smp_processor_id(), current_mask);
put_online_cpus();
 
@@ -368,7 +377,14 @@ static int start_kthread(struct trace_array *tr)
 
/* Just pick the first CPU on first iteration */
get_online_cpus();
-   cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask);
+   /*
+* Run only on CPUs in which trace and hwlat are allowed to run.
+*/
+   cpumask_and(current_mask, tr->tracing_cpumask, _cpumask);
+   /*
+* And the CPU is online.
+*/
+   cpumask_and(current_mask, cpu_online_mask, current_mask);
put_online_cpus();
next_cpu = cpumask_first(current_mask);
 
@@ -402,6 +418,94 @@ static void stop_kthread(void)
hwlat_kthread = NULL;
 }
 
+/*
+ * hwlat_cpus_read - Read function for reading the "cpus" file
+ * @filp: The active open file structure
+ * @ubuf: The userspace provided buffer to read value into
+ * @cnt: The maximum number of bytes to read
+ * @ppos: The current "file" position
+ *
+ * Prin

[RFC PATCH 0/5] hwlat improvements and osnoise tracer

2021-04-08 Thread Daniel Bristot de Oliveira

This series proposes a set of improvements and new features for the
tracing subsystem to facilitate the debugging of low latency
deployments.

Currently, hwlat runs on a single CPU at a time, migrating across a
set of CPUs in a round-robin fashion. The first three patches are
changes made to allow hwlat to run on multiple CPUs in parallel,
increasing the chances of detecting a hardware latency.

The fourth patch is a helper to print a timestamp in a u64 in
seconds.nanoseconds format on tracepoints.

The fifth patch proposes a new tracer named osnoise and aims to help
users of isolcpus= (or a similar method) to measure how much noise the
OS and the hardware add to the isolated application. The osnoise tracer
bases on the hwlat detector code. The difference is that, instead of
sampling with interrupts disabled, the osnoise tracer samples the CPU with
interrupts and preemption enabled. In this way, the sampling thread will
suffer any source of noise from the OS. The detection and classification
of the type of noise are then made by observing the entry points of NMIs,
IRQs, SoftIRQs, and threads. If none of these sources of noise is detected,
the tool associates the noise with the hardware. The tool periodically
prints a status, printing the total noise of the period, the max single
noise observed, the percentage of CPU available for the task, along with
the counters of each source of the noise. To debug the sources of noise,
the tracer also adds a set of tracepoints that print any NMI, IRQ, SofIRQ,
and thread occurrence. These tracepoints print the starting time and the
noise's net duration at the end of the noise. In this way, it reduces the
number of tracepoints (one instead of two) and the need to manually
accounting the contribution of each noise independently.

Daniel Bristot de Oliveira (4):
  tracing/hwlat: Add a cpus file specific for hwlat_detector
  tracing/hwlat: Implement the mode config option
  tracing/hwlat: Implement the per-cpu mode
  tracing: Add the osnoise tracer

Steven Rostedt (1):
  tracing: Add __print_ns_to_secs() and __print_ns_without_secs()
helpers

 Documentation/trace/hwlat_detector.rst |   29 +-
 Documentation/trace/osnoise_tracer.rst |  149 ++
 include/linux/ftrace_irq.h |   16 +
 include/trace/events/osnoise.h |  141 ++
 include/trace/trace_events.h   |   25 +
 kernel/trace/Kconfig   |   34 +
 kernel/trace/Makefile  |1 +
 kernel/trace/trace.h   |9 +-
 kernel/trace/trace_entries.h   |   27 +
 kernel/trace/trace_hwlat.c |  445 +-
 kernel/trace/trace_osnoise.c   | 1714 
 kernel/trace/trace_output.c|   72 +-
 12 files changed, 2604 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/trace/osnoise_tracer.rst
 create mode 100644 include/trace/events/osnoise.h
 create mode 100644 kernel/trace/trace_osnoise.c

-- 
2.30.2

Re: [PATCH] sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()

2021-01-19 Thread Daniel Bristot de Oliveira

On 1/19/21 9:35 AM, Dietmar Eggemann wrote:
> dl_add_task_root_domain() is called during sched domain rebuild:
> 
>   rebuild_sched_domains_locked()
> partition_and_rebuild_sched_domains()
>   rebuild_root_domains()
>  for all top_cpuset descendants:
>update_tasks_root_domain()
>  for all tasks of cpuset:
>dl_add_task_root_domain()
> 
> Change it so that only the task pi lock is taken to check if the task
> has a SCHED_DEADLINE (DL) policy. In case that p is a DL task take the
> rq lock as well to be able to safely de-reference root domain's DL
> bandwidth structure.
> 
> Most of the tasks will have another policy (namely SCHED_NORMAL) and
> can now bail without taking the rq lock.
> 
> One thing to note here: Even in case that there aren't any DL user
> tasks, a slow frequency switching system with cpufreq gov schedutil has
> a DL task (sugov) per frequency domain running which participates in DL
> bandwidth management.
> 
> Reviewed-by: Quentin Perret 
> Signed-off-by: Dietmar Eggemann 

Reviewed-by: Daniel Bristot de Oliveira 

Thanks!
-- Daniel

BUG: sleeping function called from invalid context at kernel/stop_machine.c:135

2021-01-19 Thread Daniel Bristot de Oliveira

Hi Waiman,

Are you aware of this issue:
- %< -
[   88.307857] BUG: sleeping function called from invalid context at 
kernel/stop_machine.c:135
[   88.308796] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 801, 
name: sh
[   88.309785] 6 locks held by sh/801:
[   88.310265]  #0: 9f008c575460 (sb_writers#7){.+.+}-{0:0}, at: 
ksys_write+0x58/0xd0
[   88.310906]  #1: 9f008e9dd088 (>mutex){+.+.}-{4:4}, at: 
kernfs_fop_write+0xa5/0x1c0
[   88.311672]  #2: 9f0092164a88 (kn->active#195){.+.+}-{0:0}, at: 
kernfs_fop_write+0xad/0x1c0
[   88.312456]  #3: bac68310 (cpu_hotplug_lock){}-{0:0}, at: 
sched_partition_write+0x72/0x2f0
[   88.313280]  #4: bae37090 (_rwsem){}-{0:0}, at: 
sched_partition_write+0x7e/0x2f0
[   88.314095]  #5: bad89140 (rcu_read_lock){}-{1:3}, at: 
update_sibling_cpumasks+0x5/0x140
[   88.314806] Preemption disabled at:
[   88.314810] [] preempt_schedule_thunk+0x16/0x18
[   88.315815] CPU: 1 PID: 801 Comm: sh Not tainted 5.10.0-rc5+ #10
[   88.316203] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.13.0-2.fc32 04/01/2014
[   88.316714] Call Trace:
[   88.316875]  dump_stack+0x8b/0xb0
[   88.317087]  ___might_sleep.cold+0x102/0x116
[   88.317354]  stop_one_cpu+0x82/0xa0
[   88.317578]  ? set_cpus_allowed_ptr+0x10/0x10
[   88.317858]  __set_cpus_allowed_ptr+0x1e6/0x1f0
[   88.318144]  update_tasks_cpumask+0x25/0x50
[   88.318415]  update_cpumasks_hier+0x257/0x840
[   88.318687]  update_sibling_cpumasks+0x96/0x140
[   88.318968]  update_prstate+0x1a0/0x1f0
[   88.319210]  sched_partition_write+0x9f/0x2f0
[   88.319482]  kernfs_fop_write+0xdc/0x1c0
[   88.319730]  vfs_write+0xea/0x3b0
[   88.319943]  ksys_write+0x58/0xd0
[   88.320156]  do_syscall_64+0x33/0x40
[   88.320382]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   88.320692] RIP: 0033:0x7fbbd79be537
[   88.320915] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 
f3 0f 1e fa 64 8b 04
 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3
 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   88.322028] RSP: 002b:7ffd44cc8398 EFLAGS: 0246 ORIG_RAX: 
0001
[   88.322479] RAX: ffda RBX: 0005 RCX: 7fbbd79be537
[   88.322910] RDX: 0005 RSI: 558ae69200a0 RDI: 0001
[   88.323342] RBP: 558ae69200a0 R08: 000a R09: 0004
[   88.323775] R10: 558ae6921ba0 R11: 0246 R12: 0005
[   88.325046] R13: 7fbbd7a90500 R14: 0005 R15: 7fbbd7a90700
- >% -

it happens when I run this (from Dietmar):
- %< -
cd /sys/fs/cgroup/

echo $$ > cgroup.procs

echo +cpuset > cgroup.subtree_control
chrt -d --sched-period 10 --sched-runtime 1 0 sleep 500 &
PID1=$!
chrt -d --sched-period 10 --sched-runtime 1 0 sleep 500 &
PID2=$!

sleep 3

mkdir A
echo 0 > ./A/cpuset.mems
echo 0 > ./A/cpuset.cpus

echo $PID2 > ./A/cgroup.procs
echo root > ./A/cpuset.cpus.partition
echo $PID2 > ./A/cgroup.procs

cat /proc/$PID1/status | grep Cpus_allowed_list | awk '{print $2}'

cat /proc/$PID2/status | grep Cpus_allowed_list | awk '{print $2}'
- >% -

with this kernel config:
https://bristot.me/lkml/20200119/config.txt

on Fedora 32 (x86 64 vm).

full dmesg here:
https://bristot.me/lkml/20200119/dmesg.txt

-- Daniel

Re: [PATCH 4/6] sched/deadline: Block DL tasks on non-exclusive cpuset if bandwitdh control is enable

2021-01-19 Thread Daniel Bristot de Oliveira

On 1/14/21 4:51 PM, Dietmar Eggemann wrote:
> On 12/01/2021 16:53, Daniel Bristot de Oliveira wrote:
>> The current SCHED_DEADLINE design supports only global scheduler,
>> or variants of it, i.e., clustered and partitioned, via cpuset config.
>> To enable the partitioning of a system with clusters of CPUs, the
>> documentation advises the usage of exclusive cpusets, creating an
>> exclusive root_domain for the cpuset.
>>
>> Attempts to change the cpu affinity of a thread to a cpu mask different
>> from the root domain results in an error. For instance:
> 
> [...]
> 
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 788a391657a5..c221e14d5b86 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -2878,6 +2878,13 @@ int dl_task_can_attach(struct task_struct *p,
>>  if (cpumask_empty(cs_cpus_allowed))
>>  return 0;
>>  
>> +/*
>> + * Do not allow moving tasks to non-exclusive cpusets
>> + * if bandwidth control is enabled.
>> + */
>> +if (dl_bandwidth_enabled() && !exclusive)
>> +return -EBUSY;
>> +
>>  /*
>>   * The task is not moving to another root domain, so it is
>>   * already accounted.
>>
> 
> But doesn't this mean you only have to set this cgroup exclusive/root to
> run into the same issue:
> 
> with this patch:
> 
> cgroupv1:
> 
> root@juno:/sys/fs/cgroup/cpuset# chrt -d --sched-period 10
> --sched-runtime 1 0 sleep 500 &
> [1] 1668
> root@juno:/sys/fs/cgroup/cpuset# PID1=$!
> 
> root@juno:/sys/fs/cgroup/cpuset# chrt -d --sched-period 10
> --sched-runtime 1 0 sleep 500 &
> [2] 1669
> root@juno:/sys/fs/cgroup/cpuset# PID2=$!
> 
> root@juno:/sys/fs/cgroup/cpuset# mkdir A
> 
> root@juno:/sys/fs/cgroup/cpuset# echo 0 > ./A/cpuset.mems
> root@juno:/sys/fs/cgroup/cpuset# echo 0 > ./A/cpuset.cpus
> 
> root@juno:/sys/fs/cgroup/cpuset# echo $PID2 > ./A/cgroup.procs
> -bash: echo: write error: Device or resource busy
> 
> root@juno:/sys/fs/cgroup/cpuset# echo 1 > ./A/cpuset.cpu_exclusive
> 
> root@juno:/sys/fs/cgroup/cpuset# echo $PID2 > ./A/cgroup.procs
> 
> root@juno:/sys/fs/cgroup/cpuset# cat /proc/$PID1/status | grep
> Cpus_allowed_list | awk '{print $2}'
> 0-5
> root@juno:/sys/fs/cgroup/cpuset# cat /proc/$PID2/status | grep
> Cpus_allowed_list | awk '{print $2}'
> 0

On CPU v1 we also need to disable the load balance to create a root domain, 
right?

> cgroupv2:

Yeah, I see your point. I was seeing a different output because of Fedora
default's behavior of adding the tasks to the system.slice/user.slice...

doing:

> root@juno:/sys/fs/cgroup# echo +cpuset > cgroup.subtree_control

# echo $$ > cgroup.procs

> root@juno:/sys/fs/cgroup# chrt -d --sched-period 10
> --sched-runtime 1 0 sleep 500 &
> [1] 1687
> root@juno:/sys/fs/cgroup# PID1=$!
> 
> root@juno:/sys/fs/cgroup# chrt -d --sched-period 10
> --sched-runtime 1 0 sleep 500 &
> [2] 1688
> root@juno:/sys/fs/cgroup# PID2=$!
> 
> root@juno:/sys/fs/cgroup# mkdir A
> 
> root@juno:/sys/fs/cgroup# echo 0 > ./A/cpuset.mems
> root@juno:/sys/fs/cgroup# echo 0 > ./A/cpuset.cpus
> 
> root@juno:/sys/fs/cgroup# echo $PID2 > ./A/cgroup.procs
> -bash: echo: write error: Device or resource busy
> 
> root@juno:/sys/fs/cgroup# echo root > ./A/cpuset.cpus.partition
> 
> root@juno:/sys/fs/cgroup# echo $PID2 > ./A/cgroup.procs
> 
> root@juno:/sys/fs/cgroup# cat /proc/$PID1/status | grep
> Cpus_allowed_list | awk '{print $2}'
> 0-5
> root@juno:/sys/fs/cgroup# cat /proc/$PID2/status | grep
> Cpus_allowed_list | awk '{print $2}'
> 0

makes me see the same behavior. This will require further analysis.

So, my plan now is to split this patch set into two, one with patches 1,3,5, and
6, which fixes the most "easy" part of the problems, and another with 2 and 4
that will require further investigation (discussed this with Dietmar already).

Thoughts?

-- Daniel

Re: [PATCH 6/6] sched/deadline: Fixes cpu/rd/dl_bw references for suspended tasks

2021-01-18 Thread Daniel Bristot de Oliveira

On 1/15/21 3:36 PM, Dietmar Eggemann wrote:
> On 12/01/2021 16:53, Daniel Bristot de Oliveira wrote:
> 
> [...]
> 
>> - %< -
>>   #!/bin/bash
>>   # Enter on the cgroup directory
>>   cd /sys/fs/cgroup/
>>
>>   # Check it if is cgroup v2 and enable cpuset
>>   if [ -e cgroup.subtree_control ]; then
>>  # Enable cpuset controller on cgroup v2
>>  echo +cpuset > cgroup.subtree_control
>>   fi
>>
>>   echo LOG: create an exclusive cpuset and assigned the CPU 0 to it
>>   # Create cpuset groups
>>   rmdir dl-group &> /dev/null
>>   mkdir dl-group
>>
>>   # Restrict the task to the CPU 0
>>   echo 0 > dl-group/cpuset.mems
>>   echo 0 > dl-group/cpuset.cpus
>>   echo root >  dl-group/cpuset.cpus.partition
>>
>>   echo LOG: dispatching a regular task
>>   sleep 100 &
>>   CPUSET_PID="$!"
>>
>>   # let it settle down
>>   sleep 1
>>
>>   # Assign the second task to the cgroup
> 
> There is only one task 'CPUSET_PID' involved here?

Ooops, yep, I will remove the "second" part.

>>   echo LOG: moving the second DL task to the cpuset
>>   echo "$CPUSET_PID" > dl-group/cgroup.procs 2> /dev/null
>>
>>   CPUSET_ALLOWED=`cat /proc/$CPUSET_PID/status | grep Cpus_allowed_list | 
>> awk '{print $2}'`
>>
>>   chrt -p -d --sched-period 10 --sched-runtime 1 0 
>> $CPUSET_PID
>>   ACCEPTED=$?
>>
>>   if [ $ACCEPTED == 0 ]; then
>>  echo PASS: the task became DL
>>   else
>>  echo FAIL: the task was rejected as DL
>>   fi
> 
> [...]
> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 5961a97541c2..3c2775e6869f 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5905,15 +5905,15 @@ static int __sched_setscheduler(struct task_struct 
>> *p,
>>  #ifdef CONFIG_SMP
>>  if (dl_bandwidth_enabled() && dl_policy(policy) &&
>>  !(attr->sched_flags & SCHED_FLAG_SUGOV)) {
>> -cpumask_t *span = rq->rd->span;
>> +struct root_domain *rd = dl_task_rd(p);
>>  
>>  /*
>>   * Don't allow tasks with an affinity mask smaller than
>>   * the entire root_domain to become SCHED_DEADLINE. We
>>   * will also fail if there's no bandwidth available.
>>   */
>> -if (!cpumask_subset(span, p->cpus_ptr) ||
>> -rq->rd->dl_bw.bw == 0) {
>> +if (!cpumask_subset(rd->span, p->cpus_ptr) ||
>> +rd->dl_bw.bw == 0) {
>>  retval = -EPERM;
>>  goto unlock;
>>  }
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index c221e14d5b86..1f6264cb8867 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -2678,8 +2678,8 @@ int sched_dl_overflow(struct task_struct *p, int 
>> policy,
>>  u64 period = attr->sched_period ?: attr->sched_deadline;
>>  u64 runtime = attr->sched_runtime;
>>  u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
>> -int cpus, err = -1, cpu = task_cpu(p);
>> -struct dl_bw *dl_b = dl_bw_of(cpu);
>> +int cpus, err = -1, cpu = dl_task_cpu(p);
>> +struct dl_bw *dl_b = dl_task_root_bw(p);
>>  unsigned long cap;
>>  
>>  if (attr->sched_flags & SCHED_FLAG_SUGOV)
>>
> 
> Wouldn't it be sufficient to just introduce dl_task_cpu() and use the
> correct cpu to get rd->span or struct dl_bw in __sched_setscheduler()
> -> sched_dl_overflow()?

While I think that having the dl_task_rd() makes the code more intuitive, I have
no problem on not adding it...

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5961a97541c2..0573f676696a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5905,7 +5905,8 @@ static int __sched_setscheduler(struct task_struct *p,
>  #ifdef CONFIG_SMP
> if (dl_bandwidth_enabled() && dl_policy(policy) &&
> !(attr->sched_flags & SCHED_FLAG_SUGOV)) {
> -   cpumask_t *span = rq->rd->span;
> +   int cpu = dl_task_cpu(p);
> +   cpumask_t *span = cpu_rq(cpu)->rd->span;
>  
&g

Re: [PATCH 3/6] sched/deadline: Allow DL tasks on empty (cgroup v2) cpusets

2021-01-18 Thread Daniel Bristot de Oliveira

On 1/14/21 1:12 PM, Dietmar Eggemann wrote:
> On 12/01/2021 16:53, Daniel Bristot de Oliveira wrote:
>> cgroups v2 allows the cpuset controller to be enabled/disabled on
>> demand. On Fedora 32, cpuset is disabled by default. To enable it,
>> a user needs to:
>>
>>   # cd /sys/fs/cgroup/
>>   # echo +cpuset > cgroup.subtree_control
>>
>> Existing cgroups will expose the cpuset interface (e.g., cpuset.cpus
>> file). By default, cpuset.cpus has no CPU assigned, which means that
>> existing tasks will move to a cpuset without cpus.
>>
>> With that in mind, look what happens if a SCHED_DEADLINE task exists
>> on any cgroup (user.slice by default on Fedora):
>>
>> - %< -
>>   # chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 &
> 
> Like you mentioned above, to see the issue the DL task has to be moved
> into the cgroup (e.g. user.slice) here:
> 
> echo $PID > /sys/fs/cgroup/user.slice/cgroup.procs

Ops, I "forgot" to add this step because Fedora/systemd does it by default.

I will add this to the log.

>>   # cd /sys/fs/cgroup/
>>   # echo '+cpuset' > cgroup.subtree_control
>>   [   65.384041] BUG: unable to handle page fault for address: 
>> b720f7e0
>>   [   65.384551] #PF: supervisor read access in kernel mode
>>   [   65.384923] #PF: error_code(0x) - not-present page
>>   [   65.385298] PGD 61a15067 P4D 61a15067 PUD 61a16063 PMD 800f9ddff062
>>   [   65.385781] Oops:  [#1] SMP PTI
>>   [   65.386042] CPU: 0 PID: 799 Comm: sh Not tainted 5.10.0-rc3 #1
>>   [   65.386461] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
>> 1.13.0-2.fc32 04/01/2014
>>   [   65.387077] RIP: 0010:dl_task_can_attach+0x40/0x250
>>   [   65.387429] Code: 54 55 53 48 83 ec 18 48 89 3c 24 bf ff ff ff ff e8 05 
>> a2 52 00
>>4c 63 f0 48 c7 c5 00 9e 02 00 4a 8b 04 f5 00 09 47 b6 
>> 48 89 ea
>><4c> 8b a4 10 e0 09 00 00 49 8d 44 24 40 48 89 c7 48 
>> 89 44 24
>>08 e8
>>   [   65.388768] RSP: 0018:aee8c056fcd8 EFLAGS: 00010283
>>   [   65.389148] RAX: b71e5000 RBX: aee8c056fdd0 RCX: 
>> 0040
>>   [   65.389661] RDX: 00029e00 RSI: 9db202534e48 RDI: 
>> b6d3a3e0
>>   [   65.390174] RBP: 00029e00 R08:  R09: 
>> 0004
>>   [   65.390686] R10: 0001 R11: ffa6fbff R12: 
>> aee8c056fbf0
>>   [   65.391196] R13: 9db2024e1400 R14: 0004 R15: 
>> 9db20ebb31e0
>>   [   65.391710] FS:  7f6df41b1740() GS:9db377c0() 
>> knlGS:
>>   [   65.392289] CS:  0010 DS:  ES:  CR0: 80050033
>>   [   65.392705] CR2: b720f7e0 CR3: 00010680a003 CR4: 
>> 00370ef0
>>   [   65.393220] DR0:  DR1:  DR2: 
>> 
>>   [   65.393732] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>>   [   65.394244] Call Trace:
>>   [   65.394437]  cpuset_can_attach+0x8b/0x110
>>   [   65.394732]  cgroup_migrate_execute+0x70/0x430
>>   [   65.395057]  cgroup_update_dfl_csses+0x222/0x230
>>   [   65.395392]  cgroup_subtree_control_write+0x2c6/0x3c0
>>   [   65.395759]  kernfs_fop_write+0xce/0x1b0
>>   [   65.396048]  vfs_write+0xc2/0x230
>>   [   65.396291]  ksys_write+0x4f/0xc0
>>   [   65.396533]  do_syscall_64+0x33/0x40
>>   [   65.396797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   [   65.397166] RIP: 0033:0x7f6df42a6537
>>   [   65.397428] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 
>> 00 f3 0f
>>1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 
>> 00 0f 05
>><48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 
>> 18 48 89
>>74 24
>>   [   65.398766] RSP: 002b:7ffee4128018 EFLAGS: 0246 ORIG_RAX: 
>> 0001
>>   [   65.399838] RAX: ffda RBX: 0008 RCX: 
>> 7f6df42a6537
>>   [   65.400923] RDX: 0008 RSI: 55b3f7e549e0 RDI: 
>> 0001
>>   [   65.402003] RBP: 55b3f7e549e0 R08: 000a R09: 
>> 0007
>>   [   65.403082] R10: 0004 R11: 0246 R12: 
>> 0008
>>   [   65.404156] R13: 7f6df4378500 R14: 0008 R15: 
>> 7f6df4378700
>>   [   65.405218] Modules linked in: 
>>   [   65.414172] C

[PATCH 6/6] sched/deadline: Fixes cpu/rd/dl_bw references for suspended tasks

2021-01-12 Thread Daniel Bristot de Oliveira

__set_cpus_allowed_ptr() migrates running or runnable, setting the
task's cpu accordingly. The CPU is not set when the task is not
runnable because of complications on the hotplug code. The
task cpu will be updated in the next wakeup anyway.

However, this creates a problem for the usage of task_cpu(p), which
might point the task to a CPU in which it cannot run, or worse,
a runqueue/root_domain it does not belong to, causing some
odd errors. For example, the script below shows that a sleeping
task cannot become SCHED_DEADLINE if they moved to another (exclusive)
cpuset:

- %< -
  #!/bin/bash
  # Enter on the cgroup directory
  cd /sys/fs/cgroup/

  # Check it if is cgroup v2 and enable cpuset
  if [ -e cgroup.subtree_control ]; then
# Enable cpuset controller on cgroup v2
echo +cpuset > cgroup.subtree_control
  fi

  echo LOG: create an exclusive cpuset and assigned the CPU 0 to it
  # Create cpuset groups
  rmdir dl-group &> /dev/null
  mkdir dl-group

  # Restrict the task to the CPU 0
  echo 0 > dl-group/cpuset.mems
  echo 0 > dl-group/cpuset.cpus
  echo root >  dl-group/cpuset.cpus.partition

  echo LOG: dispatching a regular task
  sleep 100 &
  CPUSET_PID="$!"

  # let it settle down
  sleep 1

  # Assign the second task to the cgroup
  echo LOG: moving the second DL task to the cpuset
  echo "$CPUSET_PID" > dl-group/cgroup.procs 2> /dev/null

  CPUSET_ALLOWED=`cat /proc/$CPUSET_PID/status | grep Cpus_allowed_list | awk 
'{print $2}'`

  chrt -p -d --sched-period 10 --sched-runtime 1 0 $CPUSET_PID
  ACCEPTED=$?

  if [ $ACCEPTED == 0 ]; then
echo PASS: the task became DL
  else
echo FAIL: the task was rejected as DL
  fi

  # Just ignore the clean up
  exec > /dev/null 2>&1
  kill -9 $CPUSET_PID
  kill -9 $ROOT_PID
  rmdir dl-group
- >% -

Long story short: the sleep task is (not runnable) on a cpu != 0. After
moving to a cpuset with only the CPU 0, task_cpu() returns a cpu that
does not belong to the cpuset the task is in, and the task is rejected
in this if:

- %< -
__sched_setscheduler():
[...]
rq = task_rq_lock(p, ); <-- uses task_cpu(), that points to
   <-- the old cpu.
[...]
if (dl_bandwidth_enabled() && dl_policy(policy) &&
!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
cpumask_t *span = rq->rd->span;<--- wrong rd!

/*
 * Don't allow tasks with an affinity mask smaller than
 * the entire root_domain to become SCHED_DEADLINE. We
 * will also fail if there's no bandwidth available.
 */
if (!cpumask_subset(span, p->cpus_ptr) ||
rq->rd->dl_bw.bw == 0) {
retval = -EPERM;   <--- returns here.
goto unlock;
}
}
- >% -

Because the rq, and so the root domain, corresponding to the ones of
the CPU in which the sleep command went to... sleep, not the ones
it will run in the next wakeup because of its affinity.

To avoid this problem, use the dl_task* helpers that return the task
cpu, root domain, and the "root" dl_bw, aware of the status of
task->cpu.

Reported-by: Marco Perronet 
Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Li Zefan 
Cc: Tejun Heo 
Cc: Johannes Weiner 
Cc: Valentin Schneider 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 kernel/sched/core.c | 6 +++---
 kernel/sched/deadline.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5961a97541c2..3c2775e6869f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5905,15 +5905,15 @@ static int __sched_setscheduler(struct task_struct *p,
 #ifdef CONFIG_SMP
if (dl_bandwidth_enabled() && dl_policy(policy) &&
!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
-   cpumask_t *span = rq->rd->span;
+   struct root_domain *rd = dl_task_rd(p);
 
/*
 * Don't allow tasks with an affinity mask smaller than
 * the entire root_domain to become SCHED_DEADLINE. We
 * will also fail if there's no bandwidth available.
 */
-   if (!cpumask_subset(span, p->cpus_ptr) ||
-   rq->rd->dl_bw.bw ==

[PATCH 2/6] sched/deadline: Inform dl_task_can_attach() if the cpuset is exclusive

2021-01-12 Thread Daniel Bristot de Oliveira

In preparation for blocking SCHED_DEADLINE tasks on non-exclusive cpuset
if bandwidth control is enabled.

No functional changes.

Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Li Zefan 
Cc: Tejun Heo 
cc: Johannes Weiner 
Cc: Valentin Schneider 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 include/linux/sched.h   | 2 +-
 kernel/cgroup/cpuset.c  | 5 -
 kernel/sched/core.c | 4 ++--
 kernel/sched/deadline.c | 3 ++-
 kernel/sched/sched.h| 2 +-
 5 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 942b87f80cc7..9525fbe032c9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1654,7 +1654,7 @@ current_restore_flags(unsigned long orig_flags, unsigned 
long flags)
 }
 
 extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct 
cpumask *trial);
-extern int task_can_attach(struct task_struct *p, const struct cpumask 
*cs_cpus_allowed);
+extern int task_can_attach(struct task_struct *p, const struct cpumask 
*cs_cpus_allowed, int exclusive);
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask 
*new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask 
*new_mask);
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 53c70c470a38..c6513dfabd3d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2132,6 +2132,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
struct cgroup_subsys_state *css;
struct cpuset *cs;
struct task_struct *task;
+   int exclusive;
int ret;
 
/* used later by cpuset_attach() */
@@ -2146,8 +2147,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
(cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
goto out_unlock;
 
+   exclusive = is_cpu_exclusive(cs);
+
cgroup_taskset_for_each(task, css, tset) {
-   ret = task_can_attach(task, cs->cpus_allowed);
+   ret = task_can_attach(task, cs->cpus_allowed, exclusive);
if (ret)
goto out_unlock;
ret = security_task_setscheduler(task);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f4aede34449c..5961a97541c2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7124,7 +7124,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
 }
 
 int task_can_attach(struct task_struct *p,
-   const struct cpumask *cs_cpus_allowed)
+   const struct cpumask *cs_cpus_allowed, int exclusive)
 {
int ret = 0;
 
@@ -7143,7 +7143,7 @@ int task_can_attach(struct task_struct *p,
}
 
if (dl_task(p))
-   ret = dl_task_can_attach(p, cs_cpus_allowed);
+   ret = dl_task_can_attach(p, cs_cpus_allowed, exclusive);
 
 out:
return ret;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c97d2c707b98..943aa32cc1bc 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2862,7 +2862,8 @@ bool dl_param_changed(struct task_struct *p, const struct 
sched_attr *attr)
 }
 
 #ifdef CONFIG_SMP
-int dl_task_can_attach(struct task_struct *p, const struct cpumask 
*cs_cpus_allowed)
+int dl_task_can_attach(struct task_struct *p,
+   const struct cpumask *cs_cpus_allowed, int exclusive)
 {
unsigned long flags, cap;
unsigned int dest_cpu;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f5acb6c5ce49..54881d99cebd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -337,7 +337,7 @@ extern void __setparam_dl(struct task_struct *p, const 
struct sched_attr *attr);
 extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr);
 extern bool __checkparam_dl(const struct sched_attr *attr);
 extern bool dl_param_changed(struct task_struct *p, const struct sched_attr 
*attr);
-extern int  dl_task_can_attach(struct task_struct *p, const struct cpumask 
*cs_cpus_allowed);
+extern int  dl_task_can_attach(struct task_struct *p, const struct cpumask 
*cs_cpus_allowed, int exclusive);
 extern int  dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const 
struct cpumask *trial);
 extern bool dl_cpu_busy(unsigned int cpu);
 
-- 
2.29.2

[PATCH 5/6] sched/deadline: Add helpers to get the correct root domain/span/dl_bw

2021-01-12 Thread Daniel Bristot de Oliveira

task_cpu() is often used on SCHED_DEADLINE to access the root_domain
and associated information. However, the task_cpu() does not always
reflect the correct CPU. The reason being is shown in the code:

>From kernel/sched/core:
- %< -
 * p->on_rq <- { 0, 1 = TASK_ON_RQ_QUEUED, 2 = TASK_ON_RQ_MIGRATING }:
 *
 *   is set by activate_task() and cleared by deactivate_task(), under
 *   rq->lock. Non-zero indicates the task is runnable, the special
 *   ON_RQ_MIGRATING state is used for migration without holding both
 *   rq->locks. It indicates task_cpu() is not stable, see task_rq_lock().
[...]
 * task_cpu(p): is changed by set_task_cpu(), the rules are:
 *
 *  - Don't call set_task_cpu() on a blocked task:
 *
 *We don't care what CPU we're not running on, this simplifies hotplug,
 *the CPU assignment of blocked tasks isn't required to be valid.
- >% -

So, a sleeping task will not have its task_cpu() stable, and this is
causing problems on SCHED_DEADLINE.

In preparation for fixing this problems, we need to add helper functions
that return the dl_task_cpu, root_domain, and "root" dl_bw that
reflects the current placement/affinity of the task.

Note that these functions are only required on the code path that
can happen when the task is not queued.

Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Li Zefan 
Cc: Tejun Heo 
Cc: Johannes Weiner 
Cc: Valentin Schneider 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 kernel/sched/sched.h | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 54881d99cebd..5f3f3cb5a6ff 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2393,6 +2393,37 @@ void __dl_update(struct dl_bw *dl_b, s64 bw)
rq->dl.extra_bw += bw;
}
 }
+
+static inline
+int dl_task_cpu(struct task_struct *p)
+{
+   /*
+* We can only rely on task_cpu() of runnable tasks.
+*/
+   if (p->on_rq)
+   return task_cpu(p);
+
+   /*
+* The task_cpu() of non-runnable task migth be pointing a CPU
+* it cannot run anymore (see set_task_cpu()). Hence, let's
+* get a possible cpu from the current cpu_mask.
+*/
+   return cpumask_any(p->cpus_ptr);
+
+}
+
+static inline
+struct root_domain *dl_task_rd(struct task_struct *p)
+{
+   int cpu = dl_task_cpu(p);
+   return cpu_rq(cpu)->rd;
+}
+
+static inline
+struct dl_bw *dl_task_root_bw(struct task_struct *p)
+{
+   return _task_rd(p)->dl_bw;
+}
 #else
 static inline
 void __dl_update(struct dl_bw *dl_b, s64 bw)
@@ -2401,6 +2432,18 @@ void __dl_update(struct dl_bw *dl_b, s64 bw)
 
dl->extra_bw += bw;
 }
+
+static inline
+int dl_task_cpu(struct task_struct *p)
+{
+   return 0;
+}
+
+static inline
+struct dl_bw *dl_task_root_bw(struct task_struct *p)
+{
+   return _rq(p)->dl.dl_bw;
+}
 #endif
 
 
-- 
2.29.2

[PATCH 4/6] sched/deadline: Block DL tasks on non-exclusive cpuset if bandwitdh control is enable

2021-01-12 Thread Daniel Bristot de Oliveira

The current SCHED_DEADLINE design supports only global scheduler,
or variants of it, i.e., clustered and partitioned, via cpuset config.
To enable the partitioning of a system with clusters of CPUs, the
documentation advises the usage of exclusive cpusets, creating an
exclusive root_domain for the cpuset.

Attempts to change the cpu affinity of a thread to a cpu mask different
from the root domain results in an error. For instance:

- %< -
[root@x1 linux]# chrt -d --sched-period 10 --sched-runtime 1 0 
sleep 1 &
[1] 69020
[root@x1 linux]# taskset -p -c 0 69020
pid 69020's current affinity list: 0-7
taskset: failed to set pid 69020's affinity: Device or resource busy
- >% -

However, such restriction can be bypassed by disabling the
SCHED_DEADLINE admission test, under the assumption that
the user is aware of the implications of such a decision.

However, Marco Perronet noticed that it was possible to
by-pass this mechanism because no restriction is currently
imposed by the cpuset mechanism.

For instance, this script:
- %< -
#!/bin/bash

# Enter on the cgroup directory
cd /sys/fs/cgroup/

# Check it if is cgroup v2 and enable cpuset
if [ -e cgroup.subtree_control ]; then
# Enable cpuset controller on cgroup v2
echo +cpuset > cgroup.subtree_control
fi

echo LOG: create a cpuset and assigned the CPU 0 to it
# Create cpuset groups
rmdir dl-group &> /dev/null
mkdir dl-group

# Restrict the task to the CPU 0
echo 0 > dl-group/cpuset.mems
echo 0 > dl-group/cpuset.cpus

# Place a task in the root cgroup
echo LOG: dispatching the first DL task
chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 &
ROOT_PID="$!"
ROOT_ALLOWED=`cat /proc/$ROOT_PID/status | grep Cpus_allowed_list | awk '{print 
$2}'`

# Disapatch another task in the root cgroup, to move it later.
echo LOG: dispatching the second DL task
chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 &
CPUSET_PID="$!"

# let them settle down
sleep 1

# Assign the second task to the cgroup
echo LOG: moving the second DL task to the cpuset
echo "$CPUSET_PID" > dl-group/cgroup.procs 2> /dev/null
ACCEPTED=$?
CPUSET_ALLOWED=`cat /proc/$CPUSET_PID/status | grep Cpus_allowed_list | awk 
'{print $2}'`

if [ $ACCEPTED == 0 ]; then
echo FAIL: a DL task was accepted on a non-exclusive cpuset
else
echo PASS: DL task was rejected on a non-exclusive cpuset
fi

if [ $ROOT_ALLOWED == $CPUSET_ALLOWED ]; then
echo PASS: the affinity did not change: $CPUSET_ALLOWED == $ROOT_ALLOWED
else
echo FAIL: the cpu affinity is different: $CPUSET_ALLOWED == 
$ROOT_ALLOWED
fi

# Just ignore the clean up
exec > /dev/null 2>&1
kill -9 $CPUSET_PID
kill -9 $ROOT_PID
rmdir dl-group
- >% -

Shows these results:
- %< -
LOG: create a cpuset and assigned the CPU 0 to it
LOG: dispatching the first DL task
LOG: dispatching the second DL task
LOG: moving the second DL task to the cpuset
FAIL: a DL task was accepted on a non-exclusive cpuset
FAIL: the cpu affinity is different: 0 == 0-3
- >% -

This result is a problem because the two tasks have a different
cpu mask, but they end up sharing the cpu 0, which is something
not supported in the current SCHED_DEADLINE designed (APA - Arbitrary
Processor Affinities).

To avoid such scenario, the correct action to be taken is rejecting
the attach of SCHED_DEADLINE thread to a non-exclusive cpuset.

With the proposed patch in place, the script above returns:

- %< -
LOG: create a cpuset and assigned the CPU 0 to it
LOG: dispatching the first DL task
LOG: dispatching the second DL task
LOG: moving the second DL task to the cpuset
PASS: DL task was rejected on a non-exclusive cpuset
PASS: the affinity did not change: 0-3 == 0-3
- >% -

Still, likewise for taskset, this restriction can be bypassed by
disabling the admission test, i.e.:

# sysctl -w kernel.sched_rt_runtime_us=-1

and work at their own risk.

Reported-by: Marco Perronet 
Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Li Zefan 
Cc: Tejun Heo 
Cc: Johannes Weiner 
Cc: Valentin Schneider 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 kernel/sched/deadline.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 788a391657a5..c221e14d5b86 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2878,6 +2878,13 @@ int dl_task_can_attach(struct task_struct *p,
if (cpumask_empty(cs_cpus_allowed))
return 0;
 
+   /*
+* Do not allow moving tasks to non-exclusive cpusets
+* if bandwidth control is enabled.
+*/
+   if (dl_

[PATCH 3/6] sched/deadline: Allow DL tasks on empty (cgroup v2) cpusets

2021-01-12 Thread Daniel Bristot de Oliveira

cgroups v2 allows the cpuset controller to be enabled/disabled on
demand. On Fedora 32, cpuset is disabled by default. To enable it,
a user needs to:

  # cd /sys/fs/cgroup/
  # echo +cpuset > cgroup.subtree_control

Existing cgroups will expose the cpuset interface (e.g., cpuset.cpus
file). By default, cpuset.cpus has no CPU assigned, which means that
existing tasks will move to a cpuset without cpus.

With that in mind, look what happens if a SCHED_DEADLINE task exists
on any cgroup (user.slice by default on Fedora):

- %< -
  # chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 &
  # cd /sys/fs/cgroup/
  # echo '+cpuset' > cgroup.subtree_control
  [   65.384041] BUG: unable to handle page fault for address: b720f7e0
  [   65.384551] #PF: supervisor read access in kernel mode
  [   65.384923] #PF: error_code(0x) - not-present page
  [   65.385298] PGD 61a15067 P4D 61a15067 PUD 61a16063 PMD 800f9ddff062
  [   65.385781] Oops:  [#1] SMP PTI
  [   65.386042] CPU: 0 PID: 799 Comm: sh Not tainted 5.10.0-rc3 #1
  [   65.386461] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.13.0-2.fc32 04/01/2014
  [   65.387077] RIP: 0010:dl_task_can_attach+0x40/0x250
  [   65.387429] Code: 54 55 53 48 83 ec 18 48 89 3c 24 bf ff ff ff ff e8 05 a2 
52 00
   4c 63 f0 48 c7 c5 00 9e 02 00 4a 8b 04 f5 00 09 47 b6 48 
89 ea
   <4c> 8b a4 10 e0 09 00 00 49 8d 44 24 40 48 89 c7 48 89 
44 24
   08 e8
  [   65.388768] RSP: 0018:aee8c056fcd8 EFLAGS: 00010283
  [   65.389148] RAX: b71e5000 RBX: aee8c056fdd0 RCX: 
0040
  [   65.389661] RDX: 00029e00 RSI: 9db202534e48 RDI: 
b6d3a3e0
  [   65.390174] RBP: 00029e00 R08:  R09: 
0004
  [   65.390686] R10: 0001 R11: ffa6fbff R12: 
aee8c056fbf0
  [   65.391196] R13: 9db2024e1400 R14: 0004 R15: 
9db20ebb31e0
  [   65.391710] FS:  7f6df41b1740() GS:9db377c0() 
knlGS:
  [   65.392289] CS:  0010 DS:  ES:  CR0: 80050033
  [   65.392705] CR2: b720f7e0 CR3: 00010680a003 CR4: 
00370ef0
  [   65.393220] DR0:  DR1:  DR2: 

  [   65.393732] DR3:  DR6: fffe0ff0 DR7: 
0400
  [   65.394244] Call Trace:
  [   65.394437]  cpuset_can_attach+0x8b/0x110
  [   65.394732]  cgroup_migrate_execute+0x70/0x430
  [   65.395057]  cgroup_update_dfl_csses+0x222/0x230
  [   65.395392]  cgroup_subtree_control_write+0x2c6/0x3c0
  [   65.395759]  kernfs_fop_write+0xce/0x1b0
  [   65.396048]  vfs_write+0xc2/0x230
  [   65.396291]  ksys_write+0x4f/0xc0
  [   65.396533]  do_syscall_64+0x33/0x40
  [   65.396797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [   65.397166] RIP: 0033:0x7f6df42a6537
  [   65.397428] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 
f3 0f
   1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 
0f 05
   <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 
48 89
   74 24
  [   65.398766] RSP: 002b:7ffee4128018 EFLAGS: 0246 ORIG_RAX: 
0001
  [   65.399838] RAX: ffda RBX: 0008 RCX: 
7f6df42a6537
  [   65.400923] RDX: 0008 RSI: 55b3f7e549e0 RDI: 
0001
  [   65.402003] RBP: 55b3f7e549e0 R08: 000a R09: 
0007
  [   65.403082] R10: 0004 R11: 0246 R12: 
0008
  [   65.404156] R13: 7f6df4378500 R14: 0008 R15: 
7f6df4378700
  [   65.405218] Modules linked in: 
  [   65.414172] CR2: b720f7e0
  [   65.415117] ---[ end trace 2dbff1a688549e65 ]---
- >% -

That happens because on dl_task_can_attach():
dest_cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed);

returns a non active cpu.

Initially, I thought about returning an error and blocking the
operation. However, that is indeed not needed. The cpuset without
CPUs assigned will be a non-root cpuset, hence its cpu mask will
be the same as the root one. So, the bandwidth was already accounted,
and the task can proceed.

Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Li Zefan 
Cc: Tejun Heo 
Cc: Johannes Weiner 
Cc: Valentin Schneider 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 kernel/sched/deadline.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 943aa32cc1bc..788a391657a5 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2871,6 +2871,13 @@ int dl_task_can_attach(struct task_stru

[PATCH 1/6] sched/deadline: Consolidate the SCHED_DL task_can_attach() check on its own function

2021-01-12 Thread Daniel Bristot de Oliveira

It is easier to track the restrictions imposed on SCHED_DEADLINE
tasks if we keep them all together in a single function.

No function changes.

Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Li Zefan 
Cc: Tejun Heo 
Cc: Johannes Weiner 
Cc: Valentin Schneider 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 kernel/sched/core.c | 3 +--
 kernel/sched/deadline.c | 7 +++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7af80c3fce12..f4aede34449c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7142,8 +7142,7 @@ int task_can_attach(struct task_struct *p,
goto out;
}
 
-   if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
- cs_cpus_allowed))
+   if (dl_task(p))
ret = dl_task_can_attach(p, cs_cpus_allowed);
 
 out:
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 75686c6d4436..c97d2c707b98 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2870,6 +2870,13 @@ int dl_task_can_attach(struct task_struct *p, const 
struct cpumask *cs_cpus_allo
bool overflow;
int ret;
 
+   /*
+* The task is not moving to another root domain, so it is
+* already accounted.
+*/
+   if (cpumask_intersects(task_rq(p)->rd->span, cs_cpus_allowed))
+   return 0;
+
dest_cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed);
 
rcu_read_lock_sched();
-- 
2.29.2

[PATCH 0/6] sched/deadline: cpuset task acceptance review

2021-01-12 Thread Daniel Bristot de Oliveira

While surveying the properties of the SCHED_DEADLINE, Marco Perronet found some
inconsistencies in the acceptance of DL threads on cpuset. More precisely,
regarding the acceptance of treads with arbitrary affinity. He contacted me,
and while doing the investigation, we found yet other potential issues
addressed in this patch series.

Each patch has a more in-depth explanation, including ways to reproduce
the problem. 

Daniel Bristot de Oliveira (6):
  sched/deadline: Consolidate the SCHED_DL task_can_attach() check on
its own function
  sched/deadline: Inform dl_task_can_attach() if the cpuset is exclusive
  sched/deadline: Allow DL tasks on empty (cgroup v2) cpusets
  sched/deadline: Block DL tasks on non-exclusive cpuset if bandwitdh
control is enable
  sched/deadline: Add helpers to get the correct root domain/span/dl_bw
  sched/deadline: Fixes cpu/rd/dl_bw references for suspended tasks

 include/linux/sched.h   |  2 +-
 kernel/cgroup/cpuset.c  |  5 -
 kernel/sched/core.c | 13 ++--
 kernel/sched/deadline.c | 28 ++---
 kernel/sched/sched.h| 45 -
 5 files changed, 80 insertions(+), 13 deletions(-)

-- 
2.29.2

Re: [PATCH] notifier: Make atomic_notifiers use raw_spinlock

2020-11-30 Thread Daniel Bristot de Oliveira

On 11/22/20 9:19 PM, Valentin Schneider wrote:
> Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno
> leads to the idle task blocking on an RT sleeping spinlock down some
> notifier path:
> 
>   [1.809101] BUG: scheduling while atomic: swapper/5/0/0x0002
>   [1.809116] Modules linked in:
>   [1.809123] Preemption disabled at:
>   [1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)
>   [1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GW 
> 5.10.0-rc3-rt7 #168
>   [1.809153] Hardware name: ARM Juno development board (r0) (DT)
>   [1.809158] Call trace:
>   [1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 
> (discriminator 1))
>   [1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)
>   [1.809178] dump_stack (lib/dump_stack.c:122)
>   [1.809188] __schedule_bug (kernel/sched/core.c:4886)
>   [1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 
> kernel/sched/core.c:4913 kernel/sched/core.c:5040)
>   [1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 
> (discriminator 1))
>   [1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)
>   [1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)
>   [1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 
> kernel/locking/rtmutex.c:1139)
>   [1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 
> kernel/notifier.c:118 kernel/notifier.c:186)
>   [1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)
>   [1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 
> drivers/cpuidle/cpuidle-psci.c:129)
>   [1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)
>   [1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
>   [1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 
> kernel/sched/idle.c:273)
>   [1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
>   [1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)
> 
> Two points worth noting:
> 
> 1) That this is conceptually the same issue as pointed out in:
>313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")
> 2) Only the _robust() variant of atomic_notifier callchains suffer from
>this
> 
> AFAICT only the cpu_pm_notifier_chain really needs to be changed, but
> singling it out would mean introducing a new (truly) non-blocking API. At
> the same time, callers that are fine with any blocking within the call
> chain should use blocking notifiers, so patching up all atomic_notifier's
> doesn't seem *too* crazy to me.
> 
> Fixes: 70d932985757 ("notifier: Fix broken error handling pattern")
> Signed-off-by: Valentin Schneider 

Reviewed-by: Daniel Bristot de Oliveira 

Thanks!
-- Daniel

Re: [PATCH v2] sched/deadline: Fix priority inheritance with multiple scheduling classes

2020-11-16 Thread Daniel Bristot de Oliveira

On 11/17/20 7:14 AM, Juri Lelli wrote:
> Glenn reported that "an application [he developed produces] a BUG in
> deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
> PTHREAD_PRIO_INHERIT mutexes.  I believe the bug is triggered when a CFS
> task that was boosted by a SCHED_DEADLINE task boosts another CFS task
> (nested priority inheritance).
> 
>  [ cut here ]
>  kernel BUG at kernel/sched/deadline.c:1462!
>  invalid opcode:  [#1] PREEMPT SMP
>  CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
>  Hardware name: ...
>  RIP: 0010:enqueue_task_dl+0x335/0x910
>  Code: ...
>  RSP: 0018:c9000c2bbc68 EFLAGS: 00010002
>  RAX: 0009 RBX: 888c0af94c00 RCX: 81e12500
>  RDX: 002e RSI: 888c0af94c00 RDI: 888c10b22600
>  RBP: c9000c2bbd08 R08: 0009 R09: 0078
>  R10: 81e12440 R11: 81e1236c R12: 888bc8932600
>  R13: 888c0af94eb8 R14: 888c10b22600 R15: 888bc8932600
>  FS:  7fa58ac55700() GS:888c10b0() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 7fa58b523230 CR3: 000bf44ab003 CR4: 007606e0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: fffe0ff0 DR7: 0400
>  PKRU: 5554
>  Call Trace:
>   ? intel_pstate_update_util_hwp+0x13/0x170
>   rt_mutex_setprio+0x1cc/0x4b0
>   task_blocks_on_rt_mutex+0x225/0x260
>   rt_spin_lock_slowlock_locked+0xab/0x2d0
>   rt_spin_lock_slowlock+0x50/0x80
>   hrtimer_grab_expiry_lock+0x20/0x30
>   hrtimer_cancel+0x13/0x30
>   do_nanosleep+0xa0/0x150
>   hrtimer_nanosleep+0xe1/0x230
>   ? __hrtimer_init_sleeper+0x60/0x60
>   __x64_sys_nanosleep+0x8d/0xa0
>   do_syscall_64+0x4a/0x100
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x7fa58b52330d
>  ...
>  ---[ end trace 0002 ]—
> 
> He also provided a simple reproducer creating the situation below:
> 
>  So the execution order of locking steps are the following
>  (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
>  are mutexes that are enabled * with priority inheritance.)
> 
>  Time moves forward as this timeline goes down:
> 
>  N1  N2   D1
>  |   ||
>  |   ||
>  Lock(M1)||
>  |   ||
>  | Lock(M2)   |
>  |   ||
>  |   |  Lock(M2)
>  |   ||
>  | Lock(M1)   |
>  | (!!bug triggered!) |
> 
> Daniel reported a similar situation as well, by just letting ksoftirqd
> run with DEADLINE (and eventually block on a mutex).
> 
> Problem is that boosted entities (Priority Inheritance) use static
> DEADLINE parameters of the top priority waiter. However, there might be
> cases where top waiter could be a non-DEADLINE entity that is currently
> boosted by a DEADLINE entity from a different lock chain (i.e., nested
> priority chains involving entities of non-DEADLINE classes). In this
> case, top waiter static DEADLINE parameters could be null (initialized
> to 0 at fork()) and replenish_dl_entity() would hit a BUG().
> 
> Fix this by keeping track of the original donor and using its parameters
> when a task is boosted.
> 
> Reported-by: Glenn Elliott 
> Reported-by: Daniel Bristot de Oliveira 
> Signed-off-by: Juri Lelli 

Tested-by: Daniel Bristot de Oliveira 

Thanks!
-- Daniel

Re: [PATCH v4 00/19] sched: Migrate disable support

2020-11-09 Thread Daniel Bristot de Oliveira

On 10/23/20 12:11 PM, Peter Zijlstra wrote:
> Hi,
> 
> The fourth version of migrate_disable() for PREEMPT_RT.
> 
> Two changes since last time:
> 
>  - fixes !SMP builds (bigeasy)
>  - TLA+ validation of migrate_disable() vs sched_setaffinity() (valsch)
> 
> Esp. that latter resulted in significant changes to patch #10. Huge thanks to
> Valentin.

While I will still work on this, testing more and trying to
analyze the effects from the (rt) scheduling perspective, I
do not see any other issues (kudos to Valentin, impressive
review).

Feel free to add:

Reviewed-by: Daniel Bristot de Oliveira 

Thanks!
-- Daniel

Re: [PATCH] sched/deadline: Fix priority inheritance with multiple scheduling classes

2020-11-05 Thread Daniel Bristot de Oliveira

On 11/5/20 5:12 PM, Juri Lelli wrote:
> Hi Valentin,
> 
> On 05/11/20 15:49, Valentin Schneider wrote:
>>
>> Hi Juri,
>>
>> On 05/11/20 07:50, Juri Lelli wrote:
>>> He also provided a simple reproducer creating the situation below:
>>>
>>>  So the execution order of locking steps are the following
>>>  (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
>>>  are mutexes that are enabled * with priority inheritance.)
>>>
>>>  Time moves forward as this timeline goes down:
>>>
>>>  N1  N2   D1
>>>  |   ||
>>>  |   ||
>>>  Lock(M1)||
>>>  |   ||
>>>  | Lock(M2)   |
>>>  |   ||
>>>  |   |  Lock(M2)
>>>  |   ||
>>>  | Lock(M1)   |
>>>  | (!!bug triggered!) |
>>>
>>> Daniel reported a similar situation as well, by just letting ksoftirqd
>>> run with DEADLINE (and eventually block on a mutex).
>>>
>>> Problem is that boosted entities (Priority Inheritance) use static
>>> DEADLINE parameters of the top priority waiter. However, there might be
>>> cases where top waiter could be a non-DEADLINE entity that is currently
>>> boosted by a DEADLINE entity from a different lock chain (i.e., nested
>>> priority chains involving entities of non-DEADLINE classes). In this
>>> case, top waiter static DEADLINE parameters could be null (initialized
>>> to 0 at fork()) and replenish_dl_entity() would hit a BUG().
>>>
>>
>> IIUC, rt_mutex_get_top_task(N1) == N2, and N2->dl->deadline = 0, which
>> makes enqueue_task_dl() unhappy. And that happens because, unlike p->prio,
>> DL parameters aren't properly propagated through the chain(s).
> 
> Yep. Replenishment as well, as it's going to find dl_runtime == 0.
> 
>>> Fix this by keeping track of the original donor and using its parameters
>>> when a task is boosted.
>>>
>>> Reported-by: Glenn Elliott 
>>> Reported-by: Daniel Bristot de Oliveira 
>>> Signed-off-by: Juri Lelli 
>>>
>>> ---
>>>
>>> This is actually a v2 attempt (didn't keep $SUBJECT since it's quite
>>> different than v1 [1]) to fix this problem.
>>>
>>> v1 was admittedly pretty ugly. Hope this looks slightly better (even
>>> though there is of course overhead associated to the additional
>>> pointer).
>>>
>>> Again, the proper way to fix this is by proxy execution. But, I don't
>>> think we are yet there and this problem needs a quick band-aid.
>>>
>>> One could probably also think to complicate the present approach and
>>> actually perform accounting using donor's dynamic parameters, but I fear
>>> it would be of little benefit since it would still bring all the
>>> problems associated to affinities. So, I'd propose let's try to fix all
>>> this properly with proxy and just avoid crashes in the meantime.
>>>
>>
>> For my own sake, what affinity problems are you thinking of?
>>
>> With proxy exec we have this "funny" dance of shoving the entire blocked-on
>> chain on a single runqueue to get the right selection out of
>> pick_next_task(), and that needs to deal with affinity (i.e. move the task
>> back to a sensible rq once it becomes runnable).
>>
>> With the current PI, the waiting tasks are blocked and enqueued in the
>> pi_waiters tree, so as I see it affinity shouldn't matter; what am I
>> missing / not seeing? Is that related to bandwidth handling?
> 
> Think we might break admission control checks if donor and bosted are,
> for example, on different exclusive sets of CPUs. Guess that is a
> problem with proxy as well, though. As said in the comment above, this
> is unfortunately not much more than a band-aid. Hoping we can buy us
> some time and fix it properly with proxy.

I agree with Juri that the current approach is known to be broken,
and that the proxy execution seems to be the mechanisms to go to
try to address these problems. However, this will take some time.

Meanwhile, this patch that Juri proposes fixes problem
in the current mechanism - using the same approach (and breaking
in a known way :D).

A proper way to handle the priority inversion with a disjoint
set of CPUs is something that will also be an issue with proxy
execution. But that is an even

Re: [PATCH v6 0/2] sched/deadline: Fix and optimize sched_dl_global_validate()

2020-10-18 Thread Daniel Bristot de Oliveira

Hi,

On 10/14/20 3:32 PM, Juri Lelli wrote:
> Hi,
> 
> On 08/10/20 23:47, Peng Liu wrote:
>> When change global rt bandwidth, we check to make sure that new
>> settings could accommodate the allocated dl bandwidth.
>>
>> Under SMP, the dl_bw is on a per root domain basis, currently we check
>> and update the new settings one cpu by one cpu, but not in the unit of
>> root domain, which is either overdoing or error.
>>
>> patch 1 removed the superfluous checking and updating
>> patch 2 fixed the error validation
>>
>> For details, please see the corresponding patch.
>>
>> 
>> v6 <-- v5:
>>  - no functional changes, just revert visit_gen back to u64;
>>
>> v5 <-- v4:
>>  - no functional changes, just split the v4 single patch to two to
>>obey the "one patch do only one thing" rule;
>>  - turn root_domain::visit_gen from u64 to u32;
>>both suggested by Juri.
>>  - refine changelog;
>>
>> v4 <-- v3:
>>  - refine changelog;
>>  - eliminate the ugly #ifdef guys with Peter's method;
>>
>> v3 <-- v2:
>>  - fix build error for !CONFIG_SMP, reported by kernel test robot;
>>
>> v2 <-- v1:
>>  - replace cpumask_weight(cpu_rq(cpu)->rd->span) with dl_bw_cpus(cpu),
>>suggested by Juri;
>>
>> Peng Liu (2):
>>   sched/deadline: Optimize sched_dl_global_validate()
>>   sched/deadline: Fix sched_dl_global_validate()
>>
>>  kernel/sched/deadline.c | 44 +++
>>  kernel/sched/sched.h| 51 ++---
>>  kernel/sched/topology.c |  1 +
>>  3 files changed, 63 insertions(+), 33 deletions(-)
> 
> These look now good to me. Thanks a lot!
> 
> Acked-by: Juri Lelli 


I just found a minor thing in a comment:

"It's *monotonously* increasing."

I tried to find the usage of "monotonously increasing" for monotonic functions,
and I did not find it. The (as least most used) term is "monotonically," so the
sentence would be like "It is a monotonically increasing value." But it is just
a minor thing. Anyways:

Reviewed-by: Daniel Bristot de Oliveira 

Thanks!
-- Daniel

[tip: sched/core] sched/deadline: Unthrottle PI boosted threads while enqueuing

2020-10-05 Thread tip-bot2 for Daniel Bristot de Oliveira

The following commit has been merged into the sched/core branch of tip:

Commit-ID: feff2e65efd8d84cf831668e182b2ce73c604bbb
Gitweb:
https://git.kernel.org/tip/feff2e65efd8d84cf831668e182b2ce73c604bbb
Author:Daniel Bristot de Oliveira 
AuthorDate:Wed, 16 Sep 2020 09:06:39 +02:00
Committer: Peter Zijlstra 
CommitterDate: Sat, 03 Oct 2020 16:30:53 +02:00

sched/deadline: Unthrottle PI boosted threads while enqueuing

stress-ng has a test (stress-ng --cyclic) that creates a set of threads
under SCHED_DEADLINE with the following parameters:

dl_runtime   =  1 (10 us)
dl_deadline  = 10 (100 us)
dl_period= 10 (100 us)

These parameters are very aggressive. When using a system without HRTICK
set, these threads can easily execute longer than the dl_runtime because
the throttling happens with 1/HZ resolution.

During the main part of the test, the system works just fine because
the workload does not try to run over the 10 us. The problem happens at
the end of the test, on the exit() path. During exit(), the threads need
to do some cleanups that require real-time mutex locks, mainly those
related to memory management, resulting in this scenario:

Note: locks are rt_mutexes...
 
TASK A: TASK B: TASK C:
activation
activation
activation

lock(a): OK!lock(b): OK!

lock(a)
-> block (task A owns it)
  -> self notice/set throttled
 +--< -> arm replenished timer
 |  switch-out
 |  lock(b)
 |  ->  B prio>
 |  -> boost TASK B
 |  unlock(a)   switch-out
 |  -> handle lock a to B
 |-> wakeup(B)
 |  -> B is throttled:
 |-> do not enqueue
 | switch-out
 |
 |
 +-> replenishment timer
-> TASK B is boosted:
  -> do not enqueue
 

BOOM: TASK B is runnable but !enqueued, holding TASK C: the system
crashes with hung task C.

This problem is avoided by removing the throttle state from the boosted
thread while boosting it (by TASK A in the example above), allowing it to
be queued and run boosted.

The next replenishment will take care of the runtime overrun, pushing
the deadline further away. See the "while (dl_se->runtime <= 0)" on
replenish_dl_entity() for more information.

Reported-by: Mark Simmons 
Signed-off-by: Daniel Bristot de Oliveira 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Juri Lelli 
Tested-by: Mark Simmons 
Link: 
https://lkml.kernel.org/r/5076e003450835ec74e6fa5917d02c4fa41687e6.1600170294.git.bris...@redhat.com
---
 kernel/sched/deadline.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c19c188..6d93f45 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1525,6 +1525,27 @@ static void enqueue_task_dl(struct rq *rq, struct 
task_struct *p, int flags)
 */
if (pi_task && dl_prio(pi_task->normal_prio) && p->dl.dl_boosted) {
pi_se = _task->dl;
+   /*
+* Because of delays in the detection of the overrun of a
+* thread's runtime, it might be the case that a thread
+* goes to sleep in a rt mutex with negative runtime. As
+* a consequence, the thread will be throttled.
+*
+* While waiting for the mutex, this thread can also be
+* boosted via PI, resulting in a thread that is throttled
+* and boosted at the same time.
+*
+* In this case, the boost overrides the throttle.
+*/
+   if (p->dl.dl_throttled) {
+   /*
+* The replenish timer needs to be canceled. No
+* problem if it fires concurrently: boosted threads
+* are ignored in dl_task_timer().
+*/
+   hrtimer_try_to_cancel(>dl.dl_timer);
+   p->dl.dl_throttled = 0;
+   }
} else if (!dl_prio(p->normal_prio)) {
/*
 * Special case in which we have a !SCHED_DEADLINE task that is 
going

Re: [PATCH] sched/deadline: Unthrottle PI boosted threads while enqueuing

2020-10-02 Thread Daniel Bristot de Oliveira

On 9/18/20 8:00 AM, Juri Lelli wrote:
> Hi Daniel,
> 
> On 16/09/20 09:06, Daniel Bristot de Oliveira wrote:
>> stress-ng has a test (stress-ng --cyclic) that creates a set of threads
>> under SCHED_DEADLINE with the following parameters:
>>
>> dl_runtime   =  1 (10 us)
>> dl_deadline  = 10 (100 us)
>> dl_period= 10 (100 us)
>>
>> These parameters are very aggressive. When using a system without HRTICK
>> set, these threads can easily execute longer than the dl_runtime because
>> the throttling happens with 1/HZ resolution.
>>
>> During the main part of the test, the system works just fine because
>> the workload does not try to run over the 10 us. The problem happens at
>> the end of the test, on the exit() path. During exit(), the threads need
>> to do some cleanups that require real-time mutex locks, mainly those
>> related to memory management, resulting in this scenario:
>>
>> Note: locks are rt_mutexes...
>>  
>> TASK A:  TASK B: TASK C:
>> activation
>>  activation
>>  activation
>>
>> lock(a): OK! lock(b): OK!
>>  
>>  lock(a)
>>  -> block (task A owns it)
>>-> self notice/set throttled
>>  +--<  -> arm replenished timer
>>  |   switch-out
>>  |   lock(b)
>>  |   ->  B 
>> prio>
>>  |   -> boost TASK B
>>  |  unlock(a)switch-out
>>  |  -> handle lock a to B
>>  |-> wakeup(B)
>>  |  -> B is throttled:
>>  |-> do not enqueue
>>  | switch-out
>>  |
>>  |
>>  +-> replenishment timer
>>  -> TASK B is boosted:
>>-> do not enqueue
>>  
>>
>> BOOM: TASK B is runnable but !enqueued, holding TASK C: the system
>> crashes with hung task C.
>>
>> This problem is avoided by removing the throttle state from the boosted
>> thread while boosting it (by TASK A in the example above), allowing it to
>> be queued and run boosted.
>>
>> The next replenishment will take care of the runtime overrun, pushing
>> the deadline further away. See the "while (dl_se->runtime <= 0)" on
>> replenish_dl_entity() for more information.
>>
>> Signed-off-by: Daniel Bristot de Oliveira 
>> Reported-by: Mark Simmons 
>> Reviewed-by: Juri Lelli 
>> Tested-by: Mark Simmons 
>> Cc: Ingo Molnar 
>> Cc: Peter Zijlstra 
>> Cc: Juri Lelli 
>> Cc: Vincent Guittot 
>> Cc: Dietmar Eggemann 
>> Cc: Steven Rostedt 
>> Cc: Ben Segall 
>> Cc: Mel Gorman 
>> Cc: Daniel Bristot de Oliveira 
>> Cc: linux-kernel@vger.kernel.org
>>
>> ---
> 
> Thanks for this fix.
> 
> Acked-by: Juri Lelli 

This is a gentle ping... [we are facing this bug in practice :-(].

-- Daniel

> Best,
> Juri
>

[tip: sched/core] sched/rt: Disable RT_RUNTIME_SHARE by default

2020-09-29 Thread tip-bot2 for Daniel Bristot de Oliveira

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 2586af1ac187f6b3a50930a4e33497074e81762d
Gitweb:
https://git.kernel.org/tip/2586af1ac187f6b3a50930a4e33497074e81762d
Author:Daniel Bristot de Oliveira 
AuthorDate:Mon, 21 Sep 2020 16:39:49 +02:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 25 Sep 2020 14:23:24 +02:00

sched/rt: Disable RT_RUNTIME_SHARE by default

The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime
between CPUs, allowing a CPU to run a real-time task up to 100% of the
time while leaving more space for non-real-time tasks to run on the CPU
that lend rt_runtime.

The problem is that a CPU can easily borrow enough rt_runtime to allow
a spinning rt-task to run forever, starving per-cpu tasks like kworkers,
which are non-real-time by design.

This patch disables RT_RUNTIME_SHARE by default, avoiding this problem.
The feature will still be present for users that want to enable it,
though.

Signed-off-by: Daniel Bristot de Oliveira 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Wei Wang 
Link: 
https://lkml.kernel.org/r/b776ab46817e3db5d8ef79175fa0d71073c051c7.1600697903.git.bris...@redhat.com
---
 kernel/sched/features.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 7481cd9..68d369c 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -77,7 +77,7 @@ SCHED_FEAT(WARN_DOUBLE_CLOCK, false)
 SCHED_FEAT(RT_PUSH_IPI, true)
 #endif
 
-SCHED_FEAT(RT_RUNTIME_SHARE, true)
+SCHED_FEAT(RT_RUNTIME_SHARE, false)
 SCHED_FEAT(LB_MIN, false)
 SCHED_FEAT(ATTACH_AGE_LOAD, true)

Re: [ANNOUNCE][CFP] Real-Time Linux Mini-Summit 2020 - Virtual

2020-09-28 Thread Daniel Bristot de Oliveira

Hi all!

This is a gentle reminder about the CFP deadline for the Real-time Linux
Mini-Summit 2020.

The CFP will be open until September 30th, 2020, at 23:59 PST.

IOW: this Wednesday (in the previous mail I said that Sept 30th was on Saturday,
which is not the case (at last, not this year :-)) my bad)

Thanks!

On 9/18/20 5:17 PM, Daniel Bristot de Oliveira wrote:
> The Real-Time Mini-Summit is organized by the Linux Foundation Real-Time Linux
> (RTL) collaborative project. The event is intended to gather developers and
> users of Linux as a Real-Time Operating System. The main intent is to provide
> room for discussion between developers, tooling experts, and users.
> 
> The mini-summit will take place alongside the Open Source Summit + Embedded
> Linux Conference Europe 2020, virtually. The summit is planned last day of the
> main conference, Friday, October 30th, 2020.
> 
> If you are interested to present, please submit a proposal [1] before 
> September
> 30th, 2020, at 23:59 EST. Please provide a title, an abstract describing the
> proposed talk, a short biography, and describe the targeted audience. Please
> indicate the slot length you are aiming for: The format is a single track with
> presentation slots of 15, 30 minutes long. Considering that the presentation
> should leave a space for discussion.
> 
> We are welcoming presentations from both end-users and developers, on topics
> covering, but not limited to:
> 
>  - Real-time Linux development
>  - Real-time Linux evaluation
>  - Real-time Linux use cases (Success and failures)
>  - Real-time Linux tooling (tracing, configuration, …)
> 
> Those can cover recently available technologies, ongoing work, and new ideas.
> 
> Important Notes for Speakers:
> 
> All speakers are required to adhere to the Linux Foundation events’ Code of
> Conduct. We also highly recommend that speakers take the Linux Foundation 
> online
> Inclusive Speaker Orientation Course. Avoid sales or marketing pitches and
> discussing unlicensed or potentially closed-source technologies when preparing
> your proposal; these talks are almost always rejected due to the fact that 
> they
> take away from the integrity of our events, and are rarely well-received by
> conference attendees.
> 
> All accepted speakers are required to submit their slides prior to the event.
> 
> [1] Submission page: https://forms.gle/yQeqyrtJYezM5VRJA
> 
> Important Dates:
> 
> CFP Close: Wednesday, September 30th, 2020, at 23:59 PST
> Speaker notification: October 10th, 2020
> Conference: Friday, October 30th, 2020
> 
> Questions on submitting a proposal?
> Email Daniel Bristot de Oliveira 
>

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-24 Thread Daniel Bristot de Oliveira

On 9/24/20 10:27 AM, pet...@infradead.org wrote:
> So my current todo list is:
> 
>  - Change RT PULL
>  - Change DL PULL
>  - Add migrate_disable() tracer; exactly like preempt/irqoff, except
>measuring task-runtime instead of cpu-time.
>  - Add a mode that measures actual interference.
>  - Add a traceevent to detect preemption in migrate_disable().
> 
> 
> And then I suppose I should twist Daniel's arm to update his model to
> include these scenarios and numbers.

Challenge accepted :-)

-- Daniel

Re: [PATCH 7/9] sched: Add migrate_disable()

2020-09-23 Thread Daniel Bristot de Oliveira

On 9/23/20 7:08 PM, pet...@infradead.org wrote:
> On Wed, Sep 23, 2020 at 10:31:10AM +0200, Thomas Gleixner wrote:
>>In practice migrate disable could be taken into account on placement
>>decisions, but yes we don't have anything like that at the moment.
> I think at the very least we should do some of that.
> 
> The premise is wanting to run the M highest priority tasks, when a CPU
> drops priority, it tries to PULL a higher priority task towards itself.
> If this PULL selects a migrate_disable() tasks, it means the task is in
> the M highest prio tasks.
> 
> Since obviously that task cannot get pulled, we should then pull the
> current running task of that CPU, this would then allow the
> migrate_disable() task to resume execution.
> 
> I'll go try and cook up something along those lines...
> 

sounds promising...

-- Daniel

Re: [PATCH 7/9] sched: Add migrate_disable()

2020-09-23 Thread Daniel Bristot de Oliveira

On 9/23/20 10:31 AM, Thomas Gleixner wrote:
> On Mon, Sep 21 2020 at 22:42, Daniel Bristot de Oliveira wrote:
>> On 9/21/20 9:16 PM, Thomas Gleixner wrote:
>>> On Mon, Sep 21 2020 at 18:36, Peter Zijlstra wrote:
>>> But seriously, I completely understand your concern vs. schedulability
>>> theories, but those theories can neither deal well with preemption
>>> disable simply because you can create other trainwrecks when enough low
>>> priority tasks run long enough in preempt disabled regions in
>>> parallel. The scheduler simply does not know ahead how long these
>>> sections will take and how many of them will run in parallel.
>>>
>>> The theories make some assumptions about preempt disable and consider it
>>> as temporary priority ceiling, but that's all assumptions as the bounds
>>> of these operations simply unknown.
>>
>> Limited preemption is something that is more explored/well known than
>> limited/arbitrary affinity - I even know a dude that convinced academics 
>> about
>> the effects/properties of preempt disable on the PREEMPT_RT!
> 
> I'm sure I never met that guy.

It is a funny Italian/Brazilian dude

>> But I think that the message here is that: ok, migrate disable is better for 
>> the
>> "scheduling latency" than preempt disable (preempt rt goal). But the
>> indiscriminate usage of migrate disable has some undesired effects for 
>> "response
>> time" of real-time threads (scheduler goal), so we should use it with 
>> caution -
>> as much as we have with preempt disable. In the end, both are critical for
>> real-time workloads, and we need more work and analysis on them both.
> ...
>>> But as the kmap discussion has shown, the current situation of enforcing
>>> preempt disable even on a !RT kernel is not pretty either. I looked at
>>> quite some of the kmap_atomic() usage sites and the resulting
>>> workarounds for non-preemptability are pretty horrible especially if
>>> they do copy_from/to_user() or such in those regions. There is tons of
>>> other code which really only requires migrate disable
>>
>> (not having an explicit declaration of the reason to disable preemption make
>> these all hard to rework... and we will have the same with migrate disable.
>> Anyways, I agree that disabling only migration helps -rt now [and I like
>> that]... but I also fear/care for scheduler metrics on the long term... well,
>> there is still a long way until retirement.)
> 
> Lets have a look at theory and practice once more:
> 
> 1) Preempt disable
> 
>Theories take that into account by adding a SHC ('Sh*t Happens
>Coefficient') into their formulas, but the practical effects cannot
>ever be reflected in theories accurately.

It depends, an adequate theory will have the correct balance between SHC and
precision. The idea is to precisely define the behavior as much as possible,
trying to reduce the SHC.

>In practice, preempt disable can cause unbound latencies and while we
>all agree that long preempt/interrupt disabled sections are bad, it's
>not really trivial to break these up without rewriting stuff from
>scratch. The recent discussion about unbound latencies in the page
>allocator is a prime example for that.

Here we need to separate two contexts: the synchronization and the code 
contexts.

At the synchronization level the preempt_disable() is bounded [1]:

The PREEMPT_RT can have only *1* preempt_disable()/enable() not to schedule
section (the worst (SHC)) before starting the process of calling the scheduler.
So the SHC factor is then reduced to the code context [2].

The SHC is in the code context, and it is mitigated by the constant monitoring
of the code sections via tests.

>The ever growing usage of per CPU storage is not making anything
>better and right now preempt disable is the only tool we have at the
>moment in mainline to deal with that.
> 
>That forces people to come up with code constructs which are more
>than suboptimal both in terms of code quality and in terms of
>schedulability/latency. We've seen mutexes converted to spinlocks
>just because of that, conditionals depending on execution context
>which turns out to be broken and inconsistent, massive error handling
>trainwrecks, etc.

I see and agree with you on this point.

> 2) Migrate disable
> 
>Theories do not know anything about it, but in the very end it's
>going to be yet another variant of SHC to be defined.

I agree. There are very few things known at this point. However, we can take as
exercise the example that Peter mentioned:

CPU 0

Re: [PATCH] sched/rt: Disable RT_RUNTIME_SHARE by default

2020-09-22 Thread Daniel Bristot de Oliveira

On 9/22/20 7:14 PM, Wei Wang wrote:
> On Mon, Sep 21, 2020 at 7:40 AM Daniel Bristot de Oliveira
>  wrote:
>>
>> The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime
>> between CPUs, allowing a CPU to run a real-time task up to 100% of the
>> time while leaving more space for non-real-time tasks to run on the CPU
>> that lend rt_runtime.
>>
>> The problem is that a CPU can easily borrow enough rt_runtime to allow
>> a spinning rt-task to run forever, starving per-cpu tasks like kworkers,
>> which are non-real-time by design.
>>
>> This patch disables RT_RUNTIME_SHARE by default, avoiding this problem.
>> The feature will still be present for users that want to enable it,
>> though.
>>
>> Signed-off-by: Daniel Bristot de Oliveira 
>> Cc: Ingo Molnar 
>> Cc: Peter Zijlstra 
>> Cc: Juri Lelli 
>> Cc: Vincent Guittot 
>> Cc: Dietmar Eggemann 
>> Cc: Steven Rostedt 
>> Cc: Ben Segall 
>> Cc: Mel Gorman 
>> Cc: Daniel Bristot de Oliveira 
>> Cc: Thomas Gleixner 
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  kernel/sched/features.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
>> index 7481cd96f391..68d369cba9e4 100644
>> --- a/kernel/sched/features.h
>> +++ b/kernel/sched/features.h
>> @@ -77,7 +77,7 @@ SCHED_FEAT(WARN_DOUBLE_CLOCK, false)
>>  SCHED_FEAT(RT_PUSH_IPI, true)
>>  #endif
>>
>> -SCHED_FEAT(RT_RUNTIME_SHARE, true)
>> +SCHED_FEAT(RT_RUNTIME_SHARE, false)
>>  SCHED_FEAT(LB_MIN, false)
>>  SCHED_FEAT(ATTACH_AGE_LOAD, true)
>>
>> --
>> 2.26.2
>>
> 
> Tested on an Android device and can no longer see long running RT
> tasks (yes, Android have those for reasons).
> 

So:

Tested-by: Wei Wang 

?

Thanks!
-- Daniel

Re: [PATCH 7/9] sched: Add migrate_disable()

2020-09-21 Thread Daniel Bristot de Oliveira

On 9/21/20 9:16 PM, Thomas Gleixner wrote:
> On Mon, Sep 21 2020 at 18:36, Peter Zijlstra wrote:
>> Add the base migrate_disable() support (under protest).
> 
> :)
> 
>> +/*
>> + * Migrate-Disable and why it is (strongly) undesired.
>> + *
>> + * The premise of the Real-Time schedulers we have on Linux
>> + * (SCHED_FIFO/SCHED_DEADLINE) is that M CPUs can/will run M tasks
>> + * concurrently, provided there are sufficient runnable tasks, also known as
>> + * work-conserving. For instance SCHED_DEADLINE tries to schedule the M
>> + * earliest deadline threads, and SCHED_FIFO the M highest priority threads.
>> + *
>> + * The correctness of various scheduling models depends on this, but is it
>> + * broken by migrate_disable() that doesn't imply preempt_disable(). Where
>> + * preempt_disable() implies an immediate priority ceiling, preemptible
>> + * migrate_disable() allows nesting.
>> + *
>> + * The worst case is that all tasks preempt one another in a 
>> migrate_disable()
>> + * region and stack on a single CPU. This then reduces the available 
>> bandwidth
>> + * to a single CPU. And since Real-Time schedulability theory considers the
>> + * Worst-Case only, all Real-Time analysis shall revert to single-CPU
>> + * (instantly solving the SMP analysis problem).
> 
> I'm telling you for years that SMP is the source of all evils and
> NR_CPUS=0 is the ultimate solution of all problems. Paul surely
> disagrees as he thinks that NR_CPUS<0 is the right thing to do.

And I would not need to extend the model!

> But seriously, I completely understand your concern vs. schedulability
> theories, but those theories can neither deal well with preemption
> disable simply because you can create other trainwrecks when enough low
> priority tasks run long enough in preempt disabled regions in
> parallel. The scheduler simply does not know ahead how long these
> sections will take and how many of them will run in parallel.
> 
> The theories make some assumptions about preempt disable and consider it
> as temporary priority ceiling, but that's all assumptions as the bounds
> of these operations simply unknown.

Limited preemption is something that is more explored/well known than
limited/arbitrary affinity - I even know a dude that convinced academics about
the effects/properties of preempt disable on the PREEMPT_RT!

But I think that the message here is that: ok, migrate disable is better for the
"scheduling latency" than preempt disable (preempt rt goal). But the
indiscriminate usage of migrate disable has some undesired effects for "response
time" of real-time threads (scheduler goal), so we should use it with caution -
as much as we have with preempt disable. In the end, both are critical for
real-time workloads, and we need more work and analysis on them both.

>> + * The reason we have it anyway.
>> + *
>> + * PREEMPT_RT breaks a number of assumptions traditionally held. By forcing 
>> a
>> + * number of primitives into becoming preemptible, they would also allow
>> + * migration. This turns out to break a bunch of per-cpu usage. To this end,
>> + * all these primitives employ migirate_disable() to restore this implicit
>> + * assumption.
>> + *
>> + * This is a 'temporary' work-around at best. The correct solution is 
>> getting
>> + * rid of the above assumptions and reworking the code to employ explicit
>> + * per-cpu locking or short preempt-disable regions.
> 
> What timeframe are you envisioning for 'temporary'? I assume something
> which is closer to your retirement than to mine :)

/me counts how many years he still needs to wait for the retirement.

> 
>> + * The end goal must be to get rid of migrate_disable(), alternatively we 
>> need
>> + * a schedulability theory that does not depend on abritrary migration.
> 
> Finally something new the academics can twist their brain around :)

Like if there was not enough already :-)

> But as the kmap discussion has shown, the current situation of enforcing
> preempt disable even on a !RT kernel is not pretty either. I looked at
> quite some of the kmap_atomic() usage sites and the resulting
> workarounds for non-preemptability are pretty horrible especially if
> they do copy_from/to_user() or such in those regions. There is tons of
> other code which really only requires migrate disable
(not having an explicit declaration of the reason to disable preemption make
these all hard to rework... and we will have the same with migrate disable.
Anyways, I agree that disabling only migration helps -rt now [and I like
that]... but I also fear/care for scheduler metrics on the long term... well,
there is still a long way until retirement.)

Thanks!
-- Daniel

> Thanks,
> 
> tglx
>

[PATCH] x86/irq: Use printk_deferred() on raw_spin_lock() protected sections

2020-09-21 Thread Daniel Bristot de Oliveira

While testing hotplug I got this BUG:

 =
 [ BUG: Invalid wait context ]
 5.9.0-rc5+ #3 Not tainted
 -
 migration/2/20 is trying to lock:
 b4315778 (>lock){-.-.}-{3:3}, at: 
serial8250_console_write+0x8e/0x380
 other info that might help us debug this:
 context-{5:5}
 4 locks held by migration/2/20:
  #0: 91622a3ff4c0 (_desc_lock_class){-.-.}-{2:2}, at: 
irq_migrate_all_off_this_cpu+0x41/0x2f0
  #1: b2c509b8 (vector_lock){-.-.}-{2:2}, at: 
irq_force_complete_move+0x2a/0x70
  #2: b2c7cec0 (console_lock){+.+.}-{0:0}, at: printk+0x48/0x4a
  #3: b2c7cbe0 (console_owner){}-{0:0}, at: 
console_unlock+0x1af/0x650
 stack backtrace:
 CPU: 2 PID: 20 Comm: migration/2 Not tainted 5.9.0-rc5+ #3
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 
04/01/2014
 Call Trace:
  dump_stack+0x8b/0xb8
  __lock_acquire.cold+0x1ce/0x333
  ? stack_trace_save+0x3b/0x50
  ? save_trace+0x3f/0x360
  lock_acquire+0xbf/0x3e0
  ? serial8250_console_write+0x8e/0x380
  _raw_spin_lock_irqsave+0x48/0x60
  ? serial8250_console_write+0x8e/0x380
  serial8250_console_write+0x8e/0x380
  ? console_unlock+0x1e8/0x650
  console_unlock+0x3f3/0x650
  ? printk+0x48/0x4a
  vprintk_emit+0x1b1/0x440
  printk+0x48/0x4a
  irq_force_complete_move.cold+0xf/0x14
  irq_migrate_all_off_this_cpu+0xfa/0x2f0
  fixup_irqs+0x25/0xe8
  cpu_disable_common+0x2b8/0x2d0
  native_cpu_disable+0x18/0x30
  take_cpu_down+0x2f/0xa0
  multi_cpu_stop+0x6d/0x130
  ? stop_machine_yield+0x10/0x10
  cpu_stopper_thread+0x7b/0x110
  ? smpboot_thread_fn+0x26/0x1e0
  smpboot_thread_fn+0x10b/0x1e0
  ? smpboot_register_percpu_thread+0xf0/0xf0
  kthread+0x13a/0x150
  ? kthread_create_worker_on_cpu+0x40/0x40
  ret_from_fork+0x22/0x30
  smpboot: CPU 2 is now offline
 =

It was caused by printk() inside a code section protected by a
raw_spin_lock() that ended up calling a serial console that
uses a regular spin_lock().

Use the printk_deferred() to avoid calling the serial console
in a raw_spin_lock() protected section.

Signed-off-by: Daniel Bristot de Oliveira 
Suggested-by: Thomas Gleixner 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Marc Zyngier 
Cc: Peter Xu 
Cc: Andy Lutomirski 
Cc: Bjorn Helgaas 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/apic/vector.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f8a56b5dc29f..1a0e5535b8ac 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -406,8 +406,9 @@ static int activate_reserved(struct irq_data *irqd)
 */
if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
irq_data_get_affinity_mask(irqd))) {
-   pr_warn("irq %u: Affinity broken due to vector space 
exhaustion.\n",
-   irqd->irq);
+   printk_deferred(KERN_WARNING
+   "irq %u: Affinity broken due to vector space 
exhaustion.\n",
+   irqd->irq);
}
 
return ret;
@@ -1012,8 +1013,9 @@ void irq_force_complete_move(struct irq_desc *desc)
 * so we have the necessary information when a problem in that
 * area arises.
 */
-   pr_warn("IRQ fixup: irq %d move in progress, old vector %d\n",
-   irqd->irq, vector);
+   printk_deferred(KERN_WARNING
+   "IRQ fixup: irq %d move in progress, old vector 
%d\n",
+   irqd->irq, vector);
}
free_moved_vector(apicd);
 unlock:
@@ -1034,15 +1036,17 @@ int lapic_can_unplug_cpu(void)
tomove = irq_matrix_allocated(vector_matrix);
avl = irq_matrix_available(vector_matrix, true);
if (avl < tomove) {
-   pr_warn("CPU %u has %u vectors, %u available. Cannot disable 
CPU\n",
-   cpu, tomove, avl);
+   printk_deferred(KERN_WARNING
+   "CPU %u has %u vectors, %u available. Cannot 
disable CPU\n",
+   cpu, tomove, avl);
ret = -ENOSPC;
goto out;
}
rsvd = irq_matrix_reserved(vector_matrix);
if (avl < rsvd) {
-   pr_warn("Reserved vectors %u > available %u. IRQ request may 
fail\n",
-   rsvd, avl);
+   printk_deferred(KERN_WARNING
+   "Reserved vectors %u > available %u. IRQ 
request may fail\n",
+   rsvd, avl);
}
 out:
raw_spin_unlock(_lock);
-- 
2.26.2

[PATCH] sched/rt: Disable RT_RUNTIME_SHARE by default

2020-09-21 Thread Daniel Bristot de Oliveira

The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime
between CPUs, allowing a CPU to run a real-time task up to 100% of the
time while leaving more space for non-real-time tasks to run on the CPU
that lend rt_runtime.

The problem is that a CPU can easily borrow enough rt_runtime to allow
a spinning rt-task to run forever, starving per-cpu tasks like kworkers,
which are non-real-time by design.

This patch disables RT_RUNTIME_SHARE by default, avoiding this problem.
The feature will still be present for users that want to enable it,
though.

Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/features.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 7481cd96f391..68d369cba9e4 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -77,7 +77,7 @@ SCHED_FEAT(WARN_DOUBLE_CLOCK, false)
 SCHED_FEAT(RT_PUSH_IPI, true)
 #endif
 
-SCHED_FEAT(RT_RUNTIME_SHARE, true)
+SCHED_FEAT(RT_RUNTIME_SHARE, false)
 SCHED_FEAT(LB_MIN, false)
 SCHED_FEAT(ATTACH_AGE_LOAD, true)
 
-- 
2.26.2

[ANNOUNCE][CFP] Real-Time Linux Mini-Summit 2020 - Virtual

2020-09-18 Thread Daniel Bristot de Oliveira

The Real-Time Mini-Summit is organized by the Linux Foundation Real-Time Linux
(RTL) collaborative project. The event is intended to gather developers and
users of Linux as a Real-Time Operating System. The main intent is to provide
room for discussion between developers, tooling experts, and users.

The mini-summit will take place alongside the Open Source Summit + Embedded
Linux Conference Europe 2020, virtually. The summit is planned last day of the
main conference, Friday, October 30th, 2020.

If you are interested to present, please submit a proposal [1] before September
30th, 2020, at 23:59 EST. Please provide a title, an abstract describing the
proposed talk, a short biography, and describe the targeted audience. Please
indicate the slot length you are aiming for: The format is a single track with
presentation slots of 15, 30 minutes long. Considering that the presentation
should leave a space for discussion.

We are welcoming presentations from both end-users and developers, on topics
covering, but not limited to:

 - Real-time Linux development
 - Real-time Linux evaluation
 - Real-time Linux use cases (Success and failures)
 - Real-time Linux tooling (tracing, configuration, …)

Those can cover recently available technologies, ongoing work, and new ideas.

Important Notes for Speakers:

All speakers are required to adhere to the Linux Foundation events’ Code of
Conduct. We also highly recommend that speakers take the Linux Foundation online
Inclusive Speaker Orientation Course. Avoid sales or marketing pitches and
discussing unlicensed or potentially closed-source technologies when preparing
your proposal; these talks are almost always rejected due to the fact that they
take away from the integrity of our events, and are rarely well-received by
conference attendees.

All accepted speakers are required to submit their slides prior to the event.

[1] Submission page: https://forms.gle/yQeqyrtJYezM5VRJA

Important Dates:

CFP Close: Saturday, September 30th, 2020, at 23:59 PST
Speaker notification: October 10th, 2020
Conference: Friday, October 30th, 2020

Questions on submitting a proposal?
Email Daniel Bristot de Oliveira

Re: [PATCH] sched/deadline: Fix stale throttling on de-/boosted tasks

2020-09-16 Thread Daniel Bristot de Oliveira

On 9/2/20 11:43 AM, pet...@infradead.org wrote:
> On Wed, Sep 02, 2020 at 08:00:24AM +0200, Juri Lelli wrote:
>> On 31/08/20 13:07, Lucas Stach wrote:
>>> When a boosted task gets throttled, what normally happens is that it's
>>> immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the
>>> runtime and clears the dl_throttled flag. There is a special case however:
>>> if the throttling happened on sched-out and the task has been deboosted in
>>> the meantime, the replenish is skipped as the task will return to its
>>> normal scheduling class. This leaves the task with the dl_throttled flag
>>> set.
>>>
>>> Now if the task gets boosted up to the deadline scheduling class again
>>> while it is sleeping, it's still in the throttled state. The normal wakeup
>>> however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't
>>> actually place it on the rq. Thus we end up with a task that is runnable,
>>> but not actually on the rq and neither a immediate replenishment happens,
>>> nor is the replenishment timer set up, so the task is stuck in
>>> forever-throttled limbo.
>>>
>>> Clear the dl_throttled flag before dropping back to the normal scheduling
>>> class to fix this issue.
>>>
>>> Signed-off-by: Lucas Stach 
> 
>> Acked-by: Juri Lelli 

I faced a similar issue, but involving DL tasks (not !DL):
  
https://lore.kernel.org/lkml/5076e003450835ec74e6fa5917d02c4fa41687e6.1600170294.git.bris...@redhat.com/

While debugging that problem, I reviewed and tested this patch,
and we need it. So:

Reviewed-by: Daniel Bristot de Oliveira 

Thanks!
-- Daniel


> Thanks!
>

[PATCH] sched/deadline: Unthrottle PI boosted threads while enqueuing

2020-09-16 Thread Daniel Bristot de Oliveira

stress-ng has a test (stress-ng --cyclic) that creates a set of threads
under SCHED_DEADLINE with the following parameters:

dl_runtime   =  1 (10 us)
dl_deadline  = 10 (100 us)
dl_period= 10 (100 us)

These parameters are very aggressive. When using a system without HRTICK
set, these threads can easily execute longer than the dl_runtime because
the throttling happens with 1/HZ resolution.

During the main part of the test, the system works just fine because
the workload does not try to run over the 10 us. The problem happens at
the end of the test, on the exit() path. During exit(), the threads need
to do some cleanups that require real-time mutex locks, mainly those
related to memory management, resulting in this scenario:

Note: locks are rt_mutexes...
 
TASK A: TASK B: TASK C:
activation
activation
activation

lock(a): OK!lock(b): OK!

lock(a)
-> block (task A owns it)
  -> self notice/set throttled
 +--< -> arm replenished timer
 |  switch-out
 |  lock(b)
 |  ->  B prio>
 |  -> boost TASK B
 |  unlock(a)   switch-out
 |  -> handle lock a to B
 |-> wakeup(B)
 |  -> B is throttled:
 |-> do not enqueue
 | switch-out
 |
 |
 +-> replenishment timer
-> TASK B is boosted:
  -> do not enqueue
 

BOOM: TASK B is runnable but !enqueued, holding TASK C: the system
crashes with hung task C.

This problem is avoided by removing the throttle state from the boosted
thread while boosting it (by TASK A in the example above), allowing it to
be queued and run boosted.

The next replenishment will take care of the runtime overrun, pushing
the deadline further away. See the "while (dl_se->runtime <= 0)" on
replenish_dl_entity() for more information.

Signed-off-by: Daniel Bristot de Oliveira 
Reported-by: Mark Simmons 
Reviewed-by: Juri Lelli 
Tested-by: Mark Simmons 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: linux-kernel@vger.kernel.org

---
 kernel/sched/deadline.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 3862a28cd05d..50ba5fca0403 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1525,6 +1525,27 @@ static void enqueue_task_dl(struct rq *rq, struct 
task_struct *p, int flags)
 */
if (pi_task && dl_prio(pi_task->normal_prio) && p->dl.dl_boosted) {
pi_se = _task->dl;
+   /*
+* Because of delays in the detection of the overrun of a
+* thread's runtime, it might be the case that a thread
+* goes to sleep in a rt mutex with negative runtime. As
+* a consequence, the thread will be throttled.
+*
+* While waiting for the mutex, this thread can also be
+* boosted via PI, resulting in a thread that is throttled
+* and boosted at the same time.
+*
+* In this case, the boost overrides the throttle.
+*/
+   if (p->dl.dl_throttled) {
+   /*
+* The replenish timer needs to be canceled. No
+* problem if it fires concurrently: boosted threads
+* are ignored in dl_task_timer().
+*/
+   hrtimer_try_to_cancel(>dl.dl_timer);
+   p->dl.dl_throttled = 0;
+   }
} else if (!dl_prio(p->normal_prio)) {
/*
 * Special case in which we have a !SCHED_DEADLINE task
-- 
2.26.2

Re: [PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer

2020-09-04 Thread Daniel Bristot de Oliveira

On 9/4/20 2:36 PM, Ingo Molnar wrote:
> Welcome Daniel! :-)
> 
> I've applied the patch to tip:sched/core.

Thanks Ingo! :D

-- Daniel

[tip: sched/core] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer

2020-09-04 Thread tip-bot2 for Daniel Bristot de Oliveira

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 153908ebc8b5721658c4aba92779fc6e3e2d5a61
Gitweb:
https://git.kernel.org/tip/153908ebc8b5721658c4aba92779fc6e3e2d5a61
Author:Daniel Bristot de Oliveira 
AuthorDate:Fri, 04 Sep 2020 11:26:23 +02:00
Committer: Ingo Molnar 
CommitterDate: Fri, 04 Sep 2020 14:35:40 +02:00

MAINTAINERS: Add myself as SCHED_DEADLINE reviewer

As discussed with Juri and Peter.

Signed-off-by: Daniel Bristot de Oliveira 
Signed-off-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Juri Lelli 
Link: 
https://lore.kernel.org/r/4476a6da70949913a59dab9aacfbd12162c1fbd7.1599146667.git.bris...@redhat.com
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index deaafb6..713f82b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15355,6 +15355,7 @@ R:  Dietmar Eggemann  
(SCHED_NORMAL)
 R: Steven Rostedt  (SCHED_FIFO/SCHED_RR)
 R: Ben Segall  (CONFIG_CFS_BANDWIDTH)
 R: Mel Gorman  (CONFIG_NUMA_BALANCING)
+R: Daniel Bristot de Oliveira  (SCHED_DEADLINE)
 L: linux-kernel@vger.kernel.org
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core

Re: [PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer

2020-09-04 Thread Daniel Bristot de Oliveira

On 9/4/20 11:53 AM, Juri Lelli wrote:
> Thanks for the help Daniel!

My pleasure!

Thanks, Juri!
-- Daniel

Re: [PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer

2020-09-04 Thread Daniel Bristot de Oliveira

On 9/4/20 11:39 AM, pet...@infradead.org wrote:
> Absolutely, great to have you looking over that!

Thanks Peter :-)

-- Daniel

[PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer

2020-09-04 Thread Daniel Bristot de Oliveira

As discussed with Juri and Peter.

Signed-off-by: Daniel Bristot de Oliveira 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: linux-kernel@vger.kernel.org
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 592467ba3f4d..56d185210a43 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15363,6 +15363,7 @@ R:  Dietmar Eggemann  
(SCHED_NORMAL)
 R: Steven Rostedt  (SCHED_FIFO/SCHED_RR)
 R: Ben Segall  (CONFIG_CFS_BANDWIDTH)
 R: Mel Gorman  (CONFIG_NUMA_BALANCING)
+R: Daniel Bristot de Oliveira  (SCHED_DEADLINE)
 L: linux-kernel@vger.kernel.org
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
-- 
2.26.2

Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

2020-08-07 Thread Daniel Bristot de Oliveira

On 8/7/20 4:11 PM, pet...@infradead.org wrote:
> But I shelved all that after I heard about that other balancer idea
> Danial was suppose to be working on ;-)))

The PhD bureaucracy (and behind the scenes) were blocking me... but I am free
man now and will catch up on that ;-).

[ also because I made enough progress on this:

https://github.com/bristot/rtsl/

and can start other things. ]

-- Daniel

Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

2020-08-07 Thread Daniel Bristot de Oliveira

On 8/7/20 12:46 PM, pet...@infradead.org wrote:
> On Fri, Aug 07, 2020 at 11:56:04AM +0200, Juri Lelli wrote:
>> Starting deadline server for lower priority classes right away when
>> first task is enqueued might break guarantees, as tasks belonging to
>> intermediate priority classes could be uselessly preempted. E.g., a well
>> behaving (non hog) FIFO task can be preempted by NORMAL tasks even if
>> there are still CPU cycles available for NORMAL tasks to run, as they'll
>> be running inside the fair deadline server for some period of time.
>>
>> To prevent this issue, implement a starvation monitor mechanism that
>> starts the deadline server only if a (fair in this case) task hasn't
>> been scheduled for some interval of time after it has been enqueued.
>> Use pick/put functions to manage starvation monitor status.
> One thing I considerd was scheduling this as a least-laxity entity --
> such that it runs late, not early -- and start the server when
> rq->nr_running != rq->cfs.h_nr_running, IOW when there's !fair tasks
> around.
> 
> Not saying we should do it like that, but that's perhaps more
> deterministic than this.
> 

I agree, what we want here is something that schedules the server if it still
retains some runtime when the laxity is 0. But this is easier said than done, as
this would require another scheduler (other pros and cons and analysis (and
hours of work)...).

But, for the starvation monitor purpose, the goal is not (necessarily) to
provide a deterministic guarantee for the starving task, but to avoid system
issues while minimizing the damage to the "real" real-time workload. With that
in mind, we could relax our ambitions...

Thoughts?

-- Daniel

Re: [ANNOUNCE][CFP] Real-Time Micro Conference at LPC 2020

2020-08-06 Thread Daniel Bristot de Oliveira

On 7/3/20 6:51 PM, Daniel Bristot de Oliveira wrote:
> Proposals must be submitted by August 2nd, and submitters will be notified of
> acceptance by August 9th.

Because some people asked for more time to work on their proposals, we ended up
leaving the submission open during this week, postponing the deadline to August
9th, with notifications not later than August 16th!

[Feel free to ping me in case of doubts]

Thanks!
-- Daniel

1 2 3 4 5 6 7 >

1 - 100 of 676 matches

Mail list logo