Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()
On 6/5/24 15:24, Qais Yousef wrote: >>> But rt is a shortened version of realtime, and so it is making *it less* >>> clear that we also have DL here. >> Can SCHED_DL be considered a real-time scheduling class as in opposite >> to SCHED_BATCH for instance? Due to its requirements it fits for a real >> time scheduling class, right? >> And RT (as in real time) already includes SCHED_RR and SCHED_FIFO. > Yeah I think the usage of realtime to cover both makes sense. I followed your > precedence with task_is_realtime(). > > Anyway. If people really find this confusing, what would make sense is to > split > them and ask users to call rt_task() and dl_task() explicitly without this > wrapper. I personally like it better with the wrapper. But happy to follow the > crowd. For me, doing dl_ things it is better to keep them separate, so I can easily search for dl_ specific checks. rt_or_dl_task(p); would also make it clear that we have both. -- Daniel
Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()
On 6/5/24 11:32, Sebastian Andrzej Siewior wrote: > On 2024-06-04 17:57:46 [+0200], Daniel Bristot de Oliveira wrote: >> On 6/4/24 16:42, Qais Yousef wrote: >>> - (wakeup_rt && !dl_task(p) && !rt_task(p)) || >>> + (wakeup_rt && !realtime_task(p)) || >> >> I do not like bikeshedding, and no hard feelings... >> >> But rt is a shortened version of realtime, and so it is making *it less* >> clear that we also have DL here. > > Can SCHED_DL be considered a real-time scheduling class as in opposite > to SCHED_BATCH for instance? Due to its requirements it fits for a real > time scheduling class, right? > And RT (as in real time) already includes SCHED_RR and SCHED_FIFO. It is a real-time scheduler, but the problem is that FIFO and RR are in rt.c and they are called the "realtime" ones, so they are the first to come in mind. -- Daniel >> -- Daniel > > Sebastian >
Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()
On 6/4/24 18:37, Steven Rostedt wrote: > On Tue, 4 Jun 2024 17:57:46 +0200 > Daniel Bristot de Oliveira wrote: > >> On 6/4/24 16:42, Qais Yousef wrote: >>> - (wakeup_rt && !dl_task(p) && !rt_task(p)) || >>> + (wakeup_rt && !realtime_task(p)) || >> >> I do not like bikeshedding, and no hard feelings... >> >> But rt is a shortened version of realtime, and so it is making *it less* >> clear that we also have DL here. >> >> I know we can always read the comments, but we can do without changes >> as well... >> >> I would suggest finding the plural version for realtime_task()... so >> we know it is not about the "rt" scheduler, but rt and dl schedulers. > > priority_task() ? rt_or_dl_task() ? rt_schedulers_task() ? higher_than_fair_task() ? (this is bad haha) I am not good with names, and it is hard to find one, I know but something to make it clear that dl is also there becase rt/realtime is ambiguous with the rt.c. -- Daniel
Re: [PATCH v5 1/2] sched/rt: Clean up usage of rt_task()
On 6/4/24 16:42, Qais Yousef wrote: > - (wakeup_rt && !dl_task(p) && !rt_task(p)) || > + (wakeup_rt && !realtime_task(p)) || I do not like bikeshedding, and no hard feelings... But rt is a shortened version of realtime, and so it is making *it less* clear that we also have DL here. I know we can always read the comments, but we can do without changes as well... I would suggest finding the plural version for realtime_task()... so we know it is not about the "rt" scheduler, but rt and dl schedulers. -- Daniel
Re: [PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
On 5/20/24 07:42, Yang Li wrote: > The patch updates the function documentation comment for > rv_en(dis)able_monitor to adhere to the kernel-doc specification. > > Signed-off-by: Yang Li Acked-by: Daniel Bristot de Oliveira Thanks -- Daniel
Re: [PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
Hi Yang On 5/17/24 11:14, Yang Li wrote: > The patch updates the function documentation comment for > rv_en(dis)able_monitor to adhere to the kernel-doc specification. > > Signed-off-by: Yang Li > --- > kernel/trace/rv/rv.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c > index 2f68e93fff0b..df0745a42a3f 100644 > --- a/kernel/trace/rv/rv.c > +++ b/kernel/trace/rv/rv.c > @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def > *mdef, bool sync) > > /** > * rv_disable_monitor - disable a given runtime monitor > + * @mdef: Pointer to the monitor definition structure. This change is in for mainline kernel, why are you using the -next on the Subject? -- Daniel
[GIT PULL] tracing/tools: Updates for 6.10
Steven, RTLA: Specific for timerlat: - Improves the output of timerlat top by adding a missing \n, and by avoiding printing color-formatting characters where they are translated to regular characters. - Improves timerlat auto-analysis output by replacing '\t' with spaces to avoid copy-and-paste issues when reporting problems. - For timerlat, make the user-space (-u) option the default, as it is the most complete test. Add a -k option to use the in-kernel workload. - On timerlat top and hist, add a summary with the overall results. For instance, the minimum value for all CPUs, the overall average and the maximum value from all CPUs. - timerlat hist was printing initial values (i.e., 0 as max, and ~0 as min) if the trace stopped before the first Ret-User event. This problem was fixed by printing the " - " no value string to the output if that was the case. For all RTLA tools: - Add a --warm-up option, allowing the workload to run for before starting to collect results. - Add a --trace-buffer-size option, allowing the user to set the tracing buffer size for -t option. This option is mainly useful for reducing the trace file. Now rtla depends on libtracefs >= 1.6. - Fix the -t [trace_file] parsing, now it does not require the '=' before the option parameter, and better handles the multiple ways a user can pass the trace_file.txt Please pull the latest trace-tools-v6.10 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git trace-tools-v6.10 Tag SHA1: dbd633e7f81bac0114f609ff2eade9f4e66a28e2 Head SHA1: 59c22f70b2951d81de410d477ae536ba951b4f37 Daniel Bristot de Oliveira (8): rtla/timerlat: Simplify "no value" printing on top rtla/auto-analysis: Replace \t with spaces rtla/timerlat: Use pretty formatting only on interactive tty rtla/timerlat: Add a summary for top mode rtla/timerlat: Add a summary for hist mode rtla: Add the --warm-up option rtla/timerlat: Make user-space threads the default rtla: Add --trace-buffer-size option John Kacur (3): rtla/timerlat: Fix histogram report when a cpu count is 0 rtla: Fix -t\--trace[=file] rtla: Documentation: Fix -t, --trace Documentation/tools/rtla/common_options.rst| 11 +- .../tools/rtla/common_osnoise_options.rst | 4 + .../tools/rtla/common_timerlat_options.rst | 10 +- tools/tracing/rtla/Makefile.config | 2 +- tools/tracing/rtla/src/osnoise_hist.c | 55 +++- tools/tracing/rtla/src/osnoise_top.c | 55 +++- tools/tracing/rtla/src/timerlat_aa.c | 109 tools/tracing/rtla/src/timerlat_hist.c | 294 ++--- tools/tracing/rtla/src/timerlat_top.c | 250 +++--- tools/tracing/rtla/src/trace.c | 15 ++ tools/tracing/rtla/src/trace.h | 1 + 11 files changed, 653 insertions(+), 153 deletions(-)
[GIT PULL V2] tracing/tools: Updates for 6.9
Steven, [ I rebased my queue on top of the v6.8 tag. ] Tracing: - Update makefiles for latency-collector and RTLA, using tools/build/ makefiles like perf does, inheriting its benefits. For example, having a proper way to handle dependencies. - The timerlat tracer has an interface for any tool to use. rtla timerlat tool uses this interface dispatching its own threads as workload. But, rtla timerlat could also be used for any other process. So, add 'rtla timerlat -U' option, allowing the timerlat tool to measure the latency of any task using the timerlat tracer interface. Verification: - Update makefiles for verification/rv, using tools/build/ makefiles like perf does, inheriting its benefits. For example, having a proper way to handle dependencies. Please pull the latest trace-tools-v6.9-2 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git trace-tools-v6.9-2 Tag SHA1: e8d5e0f38601c3718874d95db2a0020ab1c454df Head SHA1: a23c05fd76cf4ad27e0c74f7a93e7b089e94a55c Daniel Bristot de Oliveira (4): tools/tracing: Use tools/build makefiles on latency-collector tools/rtla: Use tools/build makefiles to build rtla tools/verification: Use tools/build makefiles on rv tools/rtla: Add -U/--user-load option to timerlat .../tools/rtla/common_timerlat_options.rst | 6 + tools/tracing/latency/.gitignore | 5 +- tools/tracing/latency/Build| 1 + tools/tracing/latency/Makefile | 105 -- tools/tracing/latency/Makefile.config | 30 +++ tools/tracing/rtla/.gitignore | 7 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 217 +++-- tools/tracing/rtla/Makefile.config | 47 + tools/tracing/rtla/Makefile.rtla | 80 tools/tracing/rtla/Makefile.standalone | 26 +++ tools/tracing/rtla/sample/timerlat_load.py | 74 +++ tools/tracing/rtla/src/Build | 11 ++ tools/tracing/rtla/src/timerlat_hist.c | 16 +- tools/tracing/rtla/src/timerlat_top.c | 14 +- tools/verification/rv/.gitignore | 6 + tools/verification/rv/Build| 1 + tools/verification/rv/Makefile | 207 +++- tools/verification/rv/Makefile.config | 47 + tools/verification/rv/Makefile.rv | 51 + tools/verification/rv/src/Build| 4 + 21 files changed, 650 insertions(+), 306 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/sample/timerlat_load.py create mode 100644 tools/tracing/rtla/src/Build create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build ---
Re: [GIT PULL] tracing/tools: Updates for 6.9
On 3/20/24 00:02, Steven Rostedt wrote: > On Mon, 18 Mar 2024 18:41:13 +0100 > Daniel Bristot de Oliveira wrote: > >> Steven, >> >> Tracing tooling updates for 6.9 >> >> Tracing: >> - Update makefiles for latency-collector and RTLA, >> using tools/build/ makefiles like perf does, inheriting >> its benefits. For example, having a proper way to >> handle dependencies. >> >> - The timerlat tracer has an interface for any tool to use. >> rtla timerlat tool uses this interface dispatching its >> own threads as workload. But, rtla timerlat could also be >> used for any other process. So, add 'rtla timerlat -U' >> option, allowing the timerlat tool to measure the latency of >> any task using the timerlat tracer interface. >> >> Verification: >> - Update makefiles for verification/rv, using tools/build/ >> makefiles like perf does, inheriting its benefits. >> For example, having a proper way to handle dependencies. >> >> >> Please pull the latest trace-tools-v6.9 tree, which can be found at: >> >> >> git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git >> trace-tools-v6.9 > > Looks like you just built on top of a random commit from Linus's tree: yeah :-/ > commit f6cef5f8c37f58a3bc95b3754c3ae98e086631ca > Merge: 906a93befec8 8f06fb458539 > Author: Linus Torvalds > Date: Sun Mar 17 16:59:33 2024 -0700 > > Merge tag 'i3c/for-6.9' of > git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux > > Linus prefers basing off of real tags or previous pulls from us. Ack, took note. I will do on top v6.8 tag. > Can you rebase your changes on v6.8 and resend? > > $ git checkout v6.8 > $ git cherry-pick f6cef5f8c37f58a3bc95b3754c3ae98e086631ca..trace-tools-v6.9 > > Appears to work fine. questions: when something go wrong in a pull request - Should I keep the old tag, and then do another one with -2 (it seems you do like this), or delete the old tag and send it again with the same name? - Should I resend the PULL request with something in the log or at the Subject saying it is a v2 of the pull request? I could ask via chat, but I think it is good for the community to have access to these info. Thanks! -- Daniel > Thanks! > > -- Steve
[GIT PULL] tracing/tools: Updates for 6.9
Steven, Tracing tooling updates for 6.9 Tracing: - Update makefiles for latency-collector and RTLA, using tools/build/ makefiles like perf does, inheriting its benefits. For example, having a proper way to handle dependencies. - The timerlat tracer has an interface for any tool to use. rtla timerlat tool uses this interface dispatching its own threads as workload. But, rtla timerlat could also be used for any other process. So, add 'rtla timerlat -U' option, allowing the timerlat tool to measure the latency of any task using the timerlat tracer interface. Verification: - Update makefiles for verification/rv, using tools/build/ makefiles like perf does, inheriting its benefits. For example, having a proper way to handle dependencies. Please pull the latest trace-tools-v6.9 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git trace-tools-v6.9 Tag SHA1: 2eb09a97c56af3c27bd9dcebccb495f70d56d5c0 Head SHA1: 9c63d9f58a42b979a42bcaed534d9246996ac0d9 Daniel Bristot de Oliveira (4): tools/tracing: Use tools/build makefiles on latency-collector tools/rtla: Use tools/build makefiles to build rtla tools/verification: Use tools/build makefiles on rv tools/rtla: Add -U/--user-load option to timerlat .../tools/rtla/common_timerlat_options.rst | 6 + tools/tracing/latency/.gitignore | 5 +- tools/tracing/latency/Build| 1 + tools/tracing/latency/Makefile | 105 -- tools/tracing/latency/Makefile.config | 30 +++ tools/tracing/rtla/.gitignore | 7 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 217 +++-- tools/tracing/rtla/Makefile.config | 47 + tools/tracing/rtla/Makefile.rtla | 80 tools/tracing/rtla/Makefile.standalone | 26 +++ tools/tracing/rtla/sample/timerlat_load.py | 74 +++ tools/tracing/rtla/src/Build | 11 ++ tools/tracing/rtla/src/timerlat_hist.c | 16 +- tools/tracing/rtla/src/timerlat_top.c | 14 +- tools/verification/rv/.gitignore | 6 + tools/verification/rv/Build| 1 + tools/verification/rv/Makefile | 207 +++- tools/verification/rv/Makefile.config | 47 + tools/verification/rv/Makefile.rv | 51 + tools/verification/rv/src/Build| 4 + 21 files changed, 650 insertions(+), 306 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/sample/timerlat_load.py create mode 100644 tools/tracing/rtla/src/Build create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build ---
[PATCH V3 3/3] tools/verification: Use tools/build makefiles on rv
Use tools/build/ makefiles to build rv, inheriting the benefits of it. For example, having a proper way to handle dependencies. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/verification/rv/.gitignore | 6 + tools/verification/rv/Build | 1 + tools/verification/rv/Makefile| 207 +- tools/verification/rv/Makefile.config | 47 ++ tools/verification/rv/Makefile.rv | 51 +++ tools/verification/rv/src/Build | 4 + 6 files changed, 183 insertions(+), 133 deletions(-) create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build diff --git a/tools/verification/rv/.gitignore b/tools/verification/rv/.gitignore new file mode 100644 index ..34a486585a34 --- /dev/null +++ b/tools/verification/rv/.gitignore @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +rv +rv-static +fixdep +feature +FEATURE-DUMP diff --git a/tools/verification/rv/Build b/tools/verification/rv/Build new file mode 100644 index ..a44c22349d4b --- /dev/null +++ b/tools/verification/rv/Build @@ -0,0 +1 @@ +rv-y += src/ diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile index 485f8aeddbe0..411d62b3d8eb 100644 --- a/tools/verification/rv/Makefile +++ b/tools/verification/rv/Makefile @@ -1,146 +1,87 @@ -NAME := rv -# Follow the kernel version -VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion | grep -v make) - -# From libtracefs: -# Makefiles suck: This macro sets a default value of $(2) for the -# variable named by $(1), unless the variable has been set by -# environment or command line. This is necessary for CC and AR -# because make sets default values, so the simpler ?= approach -# won't work as expected. -define allow-override - $(if $(or $(findstring environment,$(origin $(1))),\ -$(findstring command line,$(origin $(1,,\ -$(eval $(1) = $(2))) -endef - -# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix. -$(call allow-override,CC,$(CROSS_COMPILE)gcc) -$(call allow-override,AR,$(CROSS_COMPILE)ar) -$(call allow-override,STRIP,$(CROSS_COMPILE)strip) -$(call allow-override,PKG_CONFIG,pkg-config) -$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/) -$(call allow-override,LDCONFIG,ldconfig) - -INSTALL= install -MKDIR = mkdir -FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ - -fasynchronous-unwind-tables -fstack-clash-protection -WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized - -ifeq ($(CC),clang) - FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) - WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +# SPDX-License-Identifier: GPL-2.0-only + +ifeq ($(srctree),) + srctree := $(patsubst %/,%,$(dir $(CURDIR))) + srctree := $(patsubst %/,%,$(dir $(srctree))) + srctree := $(patsubst %/,%,$(dir $(srctree))) endif -TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) - -CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include -LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) -LIBS := $$($(PKG_CONFIG) --libs libtracefs) - -SRC:= $(wildcard src/*.c) -HDR:= $(wildcard src/*.h) -OBJ:= $(SRC:.c=.o) -DIRS := src -FILES := Makefile README.txt -CEXT := bz2 -TARBALL:= $(NAME)-$(VERSION).tar.$(CEXT) -TAROPTS:= -cvjf $(TARBALL) -BINDIR := /usr/bin -DATADIR:= /usr/share -DOCDIR := $(DATADIR)/doc -MANDIR := $(DATADIR)/man -LICDIR := $(DATADIR)/licenses -SRCTREE:= $(or $(BUILD_SRC),$(CURDIR)) - -# If running from the tarball, man pages are stored in the Documentation -# dir. If running from the kernel source, man pages are stored in -# Documentation/tools/rv/. -ifneq ($(wildcard Documentation/.*),) -DOCSRC = Documentation/ +include $(srctree)/tools/scripts/Makefile.include + +# O is an alias for OUTPUT +OUTPUT := $(O) + +ifeq ($(OUTPUT),) + OUTPUT := $(CURDIR)/ else -DOCSRC = $(SRCTREE)/../../../Documentation/tools/rv/ + # subdir is used by the ../Makefile in $(call descend,) + ifneq ($(subdir),) +OUTPUT:= $(OUTPUT)/$(subdir) + endif +endif + +ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),) + OUTPUT := $(OUTPUT)/ endif -LIBTRACEEVENT_MIN_VERSION = 1.5 -LIBTRACEFS_MIN_VERSION = 1.3 +RV := $(OUTPUT)rv +RV_IN := $(RV)-in.o -.PHONY:all warnings show_warnings -all: warnings rv +VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | grep
[PATCH V3 2/3] tools/rtla: Use tools/build makefiles to build rtla
Use tools/build/ makefiles to build rtla, inheriting the benefits of it. For example, having a proper way to handle dependencies. rtla is built using perf infra-structure when building inside the kernel tree. At this point, rtla diverges from perf in two points: Documentation and tarball generation/build. At the documentation level, rtla is one step ahead, placing the documentation at Documentation/tools/rtla/, using the same build tools as kernel documentation. The idea is to move perf documentation to the same scheme and then share the same makefiles. rtla has a tarball target that the (old) RHEL8 uses. The tarball was kept using a simple standalone makefile for compatibility. The standalone makefile shares most of the code, e.g., flags, with regular buildings. The tarball method was set as deprecated. If necessary, we can make a rtla tarball like perf, which includes the entire tools/build. But this would also require changes in the user side (the directory structure changes, and probably the deps to build the package). Inspired on perf and objtool. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/.gitignore | 7 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 217 + tools/tracing/rtla/Makefile.config | 47 ++ tools/tracing/rtla/Makefile.rtla | 80 + tools/tracing/rtla/Makefile.standalone | 26 +++ tools/tracing/rtla/src/Build | 11 ++ 7 files changed, 244 insertions(+), 145 deletions(-) create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/src/Build diff --git a/tools/tracing/rtla/.gitignore b/tools/tracing/rtla/.gitignore index e9df32419b2b..293f0dbb0ca2 100644 --- a/tools/tracing/rtla/.gitignore +++ b/tools/tracing/rtla/.gitignore @@ -1 +1,6 @@ -/rtla +# SPDX-License-Identifier: GPL-2.0-only +rtla +rtla-static +fixdep +feature +FEATURE-DUMP diff --git a/tools/tracing/rtla/Build b/tools/tracing/rtla/Build new file mode 100644 index ..6c9d5b36a315 --- /dev/null +++ b/tools/tracing/rtla/Build @@ -0,0 +1 @@ +rtla-y += src/ diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile index afd18c678ff5..b5878be36125 100644 --- a/tools/tracing/rtla/Makefile +++ b/tools/tracing/rtla/Makefile @@ -1,157 +1,86 @@ -NAME := rtla -# Follow the kernel version -VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion | grep -v make) - -# From libtracefs: -# Makefiles suck: This macro sets a default value of $(2) for the -# variable named by $(1), unless the variable has been set by -# environment or command line. This is necessary for CC and AR -# because make sets default values, so the simpler ?= approach -# won't work as expected. -define allow-override - $(if $(or $(findstring environment,$(origin $(1))),\ -$(findstring command line,$(origin $(1,,\ -$(eval $(1) = $(2))) -endef - -# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix. -$(call allow-override,CC,$(CROSS_COMPILE)gcc) -$(call allow-override,AR,$(CROSS_COMPILE)ar) -$(call allow-override,STRIP,$(CROSS_COMPILE)strip) -$(call allow-override,PKG_CONFIG,pkg-config) -$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/) -$(call allow-override,LDCONFIG,ldconfig) - -INSTALL= install -MKDIR = mkdir -FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ - -fasynchronous-unwind-tables -fstack-clash-protection -WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized - -ifeq ($(CC),clang) - FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) - WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +# SPDX-License-Identifier: GPL-2.0-only + +ifeq ($(srctree),) + srctree := $(patsubst %/,%,$(dir $(CURDIR))) + srctree := $(patsubst %/,%,$(dir $(srctree))) + srctree := $(patsubst %/,%,$(dir $(srctree))) endif -TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) - -CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) -LIBS := $$($(PKG_CONFIG) --libs libtracefs) - -SRC:= $(wildcard src/*.c) -HDR:= $(wildcard src/*.h) -OBJ:= $(SRC:.c=.o) -DIRS := src -FILES := Makefile README.txt -CEXT := bz2 -TARBALL:= $(NAME)-$(VERSION).tar.$(CEXT) -TAROPTS:= -cvjf $(TARBALL) -BINDIR := /usr/bin -DATADIR:= /usr/share -DOCDIR := $(DATADIR)/doc -MANDIR := $(DATADIR)/man -LICDIR := $(DATADIR)/licenses -SRCTREE:= $(o
[PATCH V3 1/3] tools/tracing: Use tools/build makefiles on latency-collector
Use tools/build/ makefiles to build latency-collector, inheriting the benefits of it. For example: Before this patch, a missing tracefs/traceevents headers will result in fail like this: ~/linux/tools/tracing/latency $ make cc -Wall -Wextra -g -O2 -o latency-collector latency-collector.c -lpthread latency-collector.c:26:10: fatal error: tracefs.h: No such file or directory 26 | #include | ^~~ compilation terminated. make: *** [Makefile:14: latency-collector] Error 1 Which is not that helpful. After this change it reports: ~/linux/tools/tracing/latency# make Auto-detecting system features: ... libtraceevent: [ OFF ] ... libtracefs: [ OFF ] libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel libtracefs is missing. Please install libtracefs-dev/libtracefs-devel Makefile.config:29: *** Please, check the errors above.. Stop. This type of output is common across other tools in tools/ like perf and objtool. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/latency/.gitignore | 5 +- tools/tracing/latency/Build | 1 + tools/tracing/latency/Makefile| 105 +- tools/tracing/latency/Makefile.config | 30 4 files changed, 122 insertions(+), 19 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config diff --git a/tools/tracing/latency/.gitignore b/tools/tracing/latency/.gitignore index 0863960761e7..2bb8e60f7fdd 100644 --- a/tools/tracing/latency/.gitignore +++ b/tools/tracing/latency/.gitignore @@ -1,2 +1,5 @@ -# SPDX-License-Identifier: GPL-2.0 +# SPDX-License-Identifier: GPL-2.0-only latency-collector +fixdep +feature +FEATURE-DUMP diff --git a/tools/tracing/latency/Build b/tools/tracing/latency/Build new file mode 100644 index ..0ce65ea72bf9 --- /dev/null +++ b/tools/tracing/latency/Build @@ -0,0 +1 @@ +latency-collector-y += latency-collector.o diff --git a/tools/tracing/latency/Makefile b/tools/tracing/latency/Makefile index 40c4ddaf8be1..6518b03e05c7 100644 --- a/tools/tracing/latency/Makefile +++ b/tools/tracing/latency/Makefile @@ -1,24 +1,93 @@ -# SPDX-License-Identifier: GPL-2.0 -# Makefile for vm tools -# -VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null) -VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null) +# SPDX-License-Identifier: GPL-2.0-only -TARGETS = latency-collector -CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS) -LDFLAGS = -lpthread $(VAR_LDLIBS) +ifeq ($(srctree),) + srctree := $(patsubst %/,%,$(dir $(CURDIR))) + srctree := $(patsubst %/,%,$(dir $(srctree))) + srctree := $(patsubst %/,%,$(dir $(srctree))) +endif -all: $(TARGETS) +include $(srctree)/tools/scripts/Makefile.include -%: %.c - $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) +# O is an alias for OUTPUT +OUTPUT := $(O) -clean: - $(RM) latency-collector +ifeq ($(OUTPUT),) + OUTPUT := $(CURDIR) +else + # subdir is used by the ../Makefile in $(call descend,) + ifneq ($(subdir),) +OUTPUT := $(OUTPUT)/$(subdir) + endif +endif + +ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),) + OUTPUT := $(OUTPUT)/ +endif + +LATENCY-COLLECTOR := $(OUTPUT)latency-collector +LATENCY-COLLECTOR_IN := $(LATENCY-COLLECTOR)-in.o + +export CC := gcc +export LD := ld +export AR := ar +export PKG_CONFIG := pkg-config + +FEATURE_TESTS := libtraceevent +FEATURE_TESTS += libtracefs +FEATURE_DISPLAY:= libtraceevent +FEATURE_DISPLAY+= libtracefs + +ifeq ($(V),1) + Q= +else + Q= @ +endif + +all: $(LATENCY-COLLECTOR) + +include $(srctree)/tools/build/Makefile.include + +# check for dependencies only on required targets +NON_CONFIG_TARGETS := clean install -prefix ?= /usr/local -sbindir ?= ${prefix}/sbin +config := 1 +ifdef MAKECMDGOALS +ifeq ($(filter-out $(NON_CONFIG_TARGETS),$(MAKECMDGOALS)),) + config:= 0 +endif +endif -install: all - install -d $(DESTDIR)$(sbindir) - install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir) +ifeq ($(config),1) + include $(srctree)/tools/build/Makefile.feature + include Makefile.config +endif + +CFLAGS += $(INCLUDES) $(LIB_INCLUDES) + +export CFLAGS OUTPUT srctree + +$(LATENCY-COLLECTOR): $(LATENCY-COLLECTOR_IN) + $(QUIET_LINK)$(CC) $(LDFLAGS) -o $(LATENCY-COLLECTOR) $(LATENCY-COLLECTOR_IN) $(EXTLIBS) + +latency-collector.%: fixdep FORCE + make -f $(srctree)/tools/build/Makefile.build dir=. $@ + +$(LATENCY-COLLECTOR_IN): fixdep FORCE + make $(build)=latency-collector + +INSTALL:= install +MKDIR := mkdir +STRIP := strip +BINDIR := /usr/bin + +install: + @$(MKDIR) -p $(DESTDIR)$(BINDIR) + $(call QUIET_INSTALL,latency-collector)$(INSTALL) $(LATENCY-COLLE
[PATCH V3 0/3] tools/tracing: Use tools/build makefiles like perf
tools/tracing and tools/verification/rv are using standalone Makefiles and this approach has some drawbacks. For example, code duplication and lack of proper dependency handling, making things harder for users. Linus suggested using perf's build system, and it is indeed the best way to go. This series replaces tools/tracing and tools/verification/rv makefiles with makefiles inspired by perf and objtool that use tools/build/ infrastructure. Thanks, Arnaldo, for the pointers via chat. Link: https://lore.kernel.org/lkml/CAHk-=wjq9bjkbpi3sjn2dy5jvwpo03u9aoc6-g8anlcgq-e...@mail.gmail.com/ Changes from V2: - Link: https://lore.kernel.org/lkml/cover.1710512430.git.bris...@kernel.org/ - Fix the log of the first patch, removing the '- %< ' because some scripts interpret them as the '---' patch separator, truncating the log. Changes from V1: - Link: https://lore.kernel.org/lkml/cover.1709914259.git.bris...@kernel.org/ - Proper handle O= and OUTPUT= flags fixing "make tools/tracing" - Cleanups Daniel Bristot de Oliveira (3): tools/tracing: Use tools/build makefiles on latency-collector tools/rtla: Use tools/build makefiles to build rtla tools/verification: Use tools/build makefiles on rv tools/tracing/latency/.gitignore | 5 +- tools/tracing/latency/Build| 1 + tools/tracing/latency/Makefile | 105 ++-- tools/tracing/latency/Makefile.config | 30 tools/tracing/rtla/.gitignore | 7 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 217 + tools/tracing/rtla/Makefile.config | 47 ++ tools/tracing/rtla/Makefile.rtla | 80 + tools/tracing/rtla/Makefile.standalone | 26 +++ tools/tracing/rtla/src/Build | 11 ++ tools/verification/rv/.gitignore | 6 + tools/verification/rv/Build| 1 + tools/verification/rv/Makefile | 207 +-- tools/verification/rv/Makefile.config | 47 ++ tools/verification/rv/Makefile.rv | 51 ++ tools/verification/rv/src/Build| 4 + 17 files changed, 549 insertions(+), 297 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/src/Build create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build -- 2.44.0
Re: [PATCH V2 1/3] tools/tracing: Use tools/build makefiles on latency-collector
On 3/15/24 15:53, Arnaldo Carvalho de Melo wrote: > On Fri, Mar 15, 2024 at 03:48:58PM +0100, Daniel Bristot de Oliveira wrote: >> On 3/15/24 15:24, Daniel Bristot de Oliveira wrote: >>> Use tools/build/ makefiles to build latency-collector, inheriting >>> the benefits of it. For example: Before this patch, a missing >>> tracefs/traceevents headers will result in fail like this: >>> >>> %< --- >> >> Oops, b4 is interpreting these -- as the '---' separator, and is >> truncating >> the message. I will fix this in a v3. >> >> sorry. > > Yeah, that confuses scripts, that separator. checkpatch did not point that as an error, and git am works as well but it indeed confuses other scripts... Thanks, Arnaldo. -- Daniel
Re: [PATCH V2 1/3] tools/tracing: Use tools/build makefiles on latency-collector
On 3/15/24 15:24, Daniel Bristot de Oliveira wrote: > Use tools/build/ makefiles to build latency-collector, inheriting > the benefits of it. For example: Before this patch, a missing > tracefs/traceevents headers will result in fail like this: > > %< --- Oops, b4 is interpreting these -- as the '---' separator, and is truncating the message. I will fix this in a v3. sorry. -- Daniel
[PATCH V2 3/3] tools/verification: Use tools/build makefiles on rv
Use tools/build/ makefiles to build rv, inheriting the benefits of it. For example, having a proper way to handle dependencies. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/verification/rv/.gitignore | 6 + tools/verification/rv/Build | 1 + tools/verification/rv/Makefile| 207 +- tools/verification/rv/Makefile.config | 47 ++ tools/verification/rv/Makefile.rv | 51 +++ tools/verification/rv/src/Build | 4 + 6 files changed, 183 insertions(+), 133 deletions(-) create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build diff --git a/tools/verification/rv/.gitignore b/tools/verification/rv/.gitignore new file mode 100644 index ..34a486585a34 --- /dev/null +++ b/tools/verification/rv/.gitignore @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +rv +rv-static +fixdep +feature +FEATURE-DUMP diff --git a/tools/verification/rv/Build b/tools/verification/rv/Build new file mode 100644 index ..a44c22349d4b --- /dev/null +++ b/tools/verification/rv/Build @@ -0,0 +1 @@ +rv-y += src/ diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile index 485f8aeddbe0..411d62b3d8eb 100644 --- a/tools/verification/rv/Makefile +++ b/tools/verification/rv/Makefile @@ -1,146 +1,87 @@ -NAME := rv -# Follow the kernel version -VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion | grep -v make) - -# From libtracefs: -# Makefiles suck: This macro sets a default value of $(2) for the -# variable named by $(1), unless the variable has been set by -# environment or command line. This is necessary for CC and AR -# because make sets default values, so the simpler ?= approach -# won't work as expected. -define allow-override - $(if $(or $(findstring environment,$(origin $(1))),\ -$(findstring command line,$(origin $(1,,\ -$(eval $(1) = $(2))) -endef - -# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix. -$(call allow-override,CC,$(CROSS_COMPILE)gcc) -$(call allow-override,AR,$(CROSS_COMPILE)ar) -$(call allow-override,STRIP,$(CROSS_COMPILE)strip) -$(call allow-override,PKG_CONFIG,pkg-config) -$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/) -$(call allow-override,LDCONFIG,ldconfig) - -INSTALL= install -MKDIR = mkdir -FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ - -fasynchronous-unwind-tables -fstack-clash-protection -WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized - -ifeq ($(CC),clang) - FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) - WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +# SPDX-License-Identifier: GPL-2.0-only + +ifeq ($(srctree),) + srctree := $(patsubst %/,%,$(dir $(CURDIR))) + srctree := $(patsubst %/,%,$(dir $(srctree))) + srctree := $(patsubst %/,%,$(dir $(srctree))) endif -TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) - -CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include -LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) -LIBS := $$($(PKG_CONFIG) --libs libtracefs) - -SRC:= $(wildcard src/*.c) -HDR:= $(wildcard src/*.h) -OBJ:= $(SRC:.c=.o) -DIRS := src -FILES := Makefile README.txt -CEXT := bz2 -TARBALL:= $(NAME)-$(VERSION).tar.$(CEXT) -TAROPTS:= -cvjf $(TARBALL) -BINDIR := /usr/bin -DATADIR:= /usr/share -DOCDIR := $(DATADIR)/doc -MANDIR := $(DATADIR)/man -LICDIR := $(DATADIR)/licenses -SRCTREE:= $(or $(BUILD_SRC),$(CURDIR)) - -# If running from the tarball, man pages are stored in the Documentation -# dir. If running from the kernel source, man pages are stored in -# Documentation/tools/rv/. -ifneq ($(wildcard Documentation/.*),) -DOCSRC = Documentation/ +include $(srctree)/tools/scripts/Makefile.include + +# O is an alias for OUTPUT +OUTPUT := $(O) + +ifeq ($(OUTPUT),) + OUTPUT := $(CURDIR)/ else -DOCSRC = $(SRCTREE)/../../../Documentation/tools/rv/ + # subdir is used by the ../Makefile in $(call descend,) + ifneq ($(subdir),) +OUTPUT:= $(OUTPUT)/$(subdir) + endif +endif + +ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),) + OUTPUT := $(OUTPUT)/ endif -LIBTRACEEVENT_MIN_VERSION = 1.5 -LIBTRACEFS_MIN_VERSION = 1.3 +RV := $(OUTPUT)rv +RV_IN := $(RV)-in.o -.PHONY:all warnings show_warnings -all: warnings rv +VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | grep
[PATCH V2 2/3] tools/rtla: Use tools/build makefiles to build rtla
Use tools/build/ makefiles to build rtla, inheriting the benefits of it. For example, having a proper way to handle dependencies. rtla is built using perf infra-structure when building inside the kernel tree. At this point, rtla diverges from perf in two points: Documentation and tarball generation/build. At the documentation level, rtla is one step ahead, placing the documentation at Documentation/tools/rtla/, using the same build tools as kernel documentation. The idea is to move perf documentation to the same scheme and then share the same makefiles. rtla has a tarball target that the (old) RHEL8 uses. The tarball was kept using a simple standalone makefile for compatibility. The standalone makefile shares most of the code, e.g., flags, with regular buildings. The tarball method was set as deprecated. If necessary, we can make a rtla tarball like perf, which includes the entire tools/build. But this would also require changes in the user side (the directory structure changes, and probably the deps to build the package). Inspired on perf and objtool. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/.gitignore | 7 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 217 + tools/tracing/rtla/Makefile.config | 47 ++ tools/tracing/rtla/Makefile.rtla | 80 + tools/tracing/rtla/Makefile.standalone | 26 +++ tools/tracing/rtla/src/Build | 11 ++ 7 files changed, 244 insertions(+), 145 deletions(-) create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/src/Build diff --git a/tools/tracing/rtla/.gitignore b/tools/tracing/rtla/.gitignore index e9df32419b2b..293f0dbb0ca2 100644 --- a/tools/tracing/rtla/.gitignore +++ b/tools/tracing/rtla/.gitignore @@ -1 +1,6 @@ -/rtla +# SPDX-License-Identifier: GPL-2.0-only +rtla +rtla-static +fixdep +feature +FEATURE-DUMP diff --git a/tools/tracing/rtla/Build b/tools/tracing/rtla/Build new file mode 100644 index ..6c9d5b36a315 --- /dev/null +++ b/tools/tracing/rtla/Build @@ -0,0 +1 @@ +rtla-y += src/ diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile index afd18c678ff5..b5878be36125 100644 --- a/tools/tracing/rtla/Makefile +++ b/tools/tracing/rtla/Makefile @@ -1,157 +1,86 @@ -NAME := rtla -# Follow the kernel version -VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion | grep -v make) - -# From libtracefs: -# Makefiles suck: This macro sets a default value of $(2) for the -# variable named by $(1), unless the variable has been set by -# environment or command line. This is necessary for CC and AR -# because make sets default values, so the simpler ?= approach -# won't work as expected. -define allow-override - $(if $(or $(findstring environment,$(origin $(1))),\ -$(findstring command line,$(origin $(1,,\ -$(eval $(1) = $(2))) -endef - -# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix. -$(call allow-override,CC,$(CROSS_COMPILE)gcc) -$(call allow-override,AR,$(CROSS_COMPILE)ar) -$(call allow-override,STRIP,$(CROSS_COMPILE)strip) -$(call allow-override,PKG_CONFIG,pkg-config) -$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/) -$(call allow-override,LDCONFIG,ldconfig) - -INSTALL= install -MKDIR = mkdir -FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ - -fasynchronous-unwind-tables -fstack-clash-protection -WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized - -ifeq ($(CC),clang) - FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) - WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +# SPDX-License-Identifier: GPL-2.0-only + +ifeq ($(srctree),) + srctree := $(patsubst %/,%,$(dir $(CURDIR))) + srctree := $(patsubst %/,%,$(dir $(srctree))) + srctree := $(patsubst %/,%,$(dir $(srctree))) endif -TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) - -CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) -LIBS := $$($(PKG_CONFIG) --libs libtracefs) - -SRC:= $(wildcard src/*.c) -HDR:= $(wildcard src/*.h) -OBJ:= $(SRC:.c=.o) -DIRS := src -FILES := Makefile README.txt -CEXT := bz2 -TARBALL:= $(NAME)-$(VERSION).tar.$(CEXT) -TAROPTS:= -cvjf $(TARBALL) -BINDIR := /usr/bin -DATADIR:= /usr/share -DOCDIR := $(DATADIR)/doc -MANDIR := $(DATADIR)/man -LICDIR := $(DATADIR)/licenses -SRCTREE:= $(o
[PATCH V2 1/3] tools/tracing: Use tools/build makefiles on latency-collector
Use tools/build/ makefiles to build latency-collector, inheriting the benefits of it. For example: Before this patch, a missing tracefs/traceevents headers will result in fail like this: %< --- ~/linux/tools/tracing/latency $ make cc -Wall -Wextra -g -O2 -o latency-collector latency-collector.c -lpthread latency-collector.c:26:10: fatal error: tracefs.h: No such file or directory 26 | #include | ^~~ compilation terminated. make: *** [Makefile:14: latency-collector] Error 1 %< --- Which is not that helpful. After this change it reports: %< --- ~/linux/tools/tracing/latency# make Auto-detecting system features: ... libtraceevent: [ OFF ] ... libtracefs: [ OFF ] libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel libtracefs is missing. Please install libtracefs-dev/libtracefs-devel Makefile.config:29: *** Please, check the errors above.. Stop. %< --- This type of output is common across other tools in tools/ like perf and objtool. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/latency/.gitignore | 5 +- tools/tracing/latency/Build | 1 + tools/tracing/latency/Makefile| 105 +- tools/tracing/latency/Makefile.config | 30 4 files changed, 122 insertions(+), 19 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config diff --git a/tools/tracing/latency/.gitignore b/tools/tracing/latency/.gitignore index 0863960761e7..2bb8e60f7fdd 100644 --- a/tools/tracing/latency/.gitignore +++ b/tools/tracing/latency/.gitignore @@ -1,2 +1,5 @@ -# SPDX-License-Identifier: GPL-2.0 +# SPDX-License-Identifier: GPL-2.0-only latency-collector +fixdep +feature +FEATURE-DUMP diff --git a/tools/tracing/latency/Build b/tools/tracing/latency/Build new file mode 100644 index ..0ce65ea72bf9 --- /dev/null +++ b/tools/tracing/latency/Build @@ -0,0 +1 @@ +latency-collector-y += latency-collector.o diff --git a/tools/tracing/latency/Makefile b/tools/tracing/latency/Makefile index 40c4ddaf8be1..6518b03e05c7 100644 --- a/tools/tracing/latency/Makefile +++ b/tools/tracing/latency/Makefile @@ -1,24 +1,93 @@ -# SPDX-License-Identifier: GPL-2.0 -# Makefile for vm tools -# -VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null) -VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null) +# SPDX-License-Identifier: GPL-2.0-only -TARGETS = latency-collector -CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS) -LDFLAGS = -lpthread $(VAR_LDLIBS) +ifeq ($(srctree),) + srctree := $(patsubst %/,%,$(dir $(CURDIR))) + srctree := $(patsubst %/,%,$(dir $(srctree))) + srctree := $(patsubst %/,%,$(dir $(srctree))) +endif -all: $(TARGETS) +include $(srctree)/tools/scripts/Makefile.include -%: %.c - $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) +# O is an alias for OUTPUT +OUTPUT := $(O) -clean: - $(RM) latency-collector +ifeq ($(OUTPUT),) + OUTPUT := $(CURDIR) +else + # subdir is used by the ../Makefile in $(call descend,) + ifneq ($(subdir),) +OUTPUT := $(OUTPUT)/$(subdir) + endif +endif + +ifneq ($(patsubst %/,,$(lastword $(OUTPUT))),) + OUTPUT := $(OUTPUT)/ +endif + +LATENCY-COLLECTOR := $(OUTPUT)latency-collector +LATENCY-COLLECTOR_IN := $(LATENCY-COLLECTOR)-in.o + +export CC := gcc +export LD := ld +export AR := ar +export PKG_CONFIG := pkg-config + +FEATURE_TESTS := libtraceevent +FEATURE_TESTS += libtracefs +FEATURE_DISPLAY:= libtraceevent +FEATURE_DISPLAY+= libtracefs + +ifeq ($(V),1) + Q= +else + Q= @ +endif + +all: $(LATENCY-COLLECTOR) + +include $(srctree)/tools/build/Makefile.include + +# check for dependencies only on required targets +NON_CONFIG_TARGETS := clean install -prefix ?= /usr/local -sbindir ?= ${prefix}/sbin +config := 1 +ifdef MAKECMDGOALS +ifeq ($(filter-out $(NON_CONFIG_TARGETS),$(MAKECMDGOALS)),) + config:= 0 +endif +endif -install: all - install -d $(DESTDIR)$(sbindir) - install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir) +ifeq ($(config),1) + include $(srctree)/tools/build/Makefile.feature + include Makefile.config +endif + +CFLAGS += $(INCLUDES) $(LIB_INCLUDES) + +export CFLAGS OUTPUT srctree + +$(LATENCY-COLLECTOR): $(LATENCY-COLLECTOR_IN) + $(QUIET_LINK)$(CC) $(LDFLAGS) -o $(LATENCY-COLLECTOR) $(LATENCY-COLLECTOR_IN) $(EXTLIBS) + +latency-collector.%: fixdep FORCE + make -f $(srctree)/tools/build/Makefile.build dir=. $@ + +$(LATENCY-COLLECTOR_IN): fixdep FORCE + make $(bui
[PATCH V2 0/3] tools/tracing: Use tools/build makefiles like perf
tools/tracing and tools/verification/rv are using standalone Makefiles and this approach has some drawbacks. For example, code duplication and lack of proper dependency handling, making things harder for users. Linus suggested using perf's build system, and it is indeed the best way to go. This series replaces tools/tracing and tools/verification/rv makefiles with makefiles inspired by perf and objtool that use tools/build/ infrastructure. Thanks, Arnaldo, for the pointers via chat. Link: https://lore.kernel.org/lkml/CAHk-=wjq9bjkbpi3sjn2dy5jvwpo03u9aoc6-g8anlcgq-e...@mail.gmail.com/ Changes from V1: - Link: https://lore.kernel.org/lkml/cover.1709914259.git.bris...@kernel.org/ - Proper handle O= and OUTPUT= flags fixing "make tools/tracing" - Cleanups Daniel Bristot de Oliveira (3): tools/tracing: Use tools/build makefiles on latency-collector tools/rtla: Use tools/build makefiles to build rtla tools/verification: Use tools/build makefiles on rv tools/tracing/latency/.gitignore | 5 +- tools/tracing/latency/Build| 1 + tools/tracing/latency/Makefile | 105 ++-- tools/tracing/latency/Makefile.config | 30 tools/tracing/rtla/.gitignore | 7 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 217 + tools/tracing/rtla/Makefile.config | 47 ++ tools/tracing/rtla/Makefile.rtla | 80 + tools/tracing/rtla/Makefile.standalone | 26 +++ tools/tracing/rtla/src/Build | 11 ++ tools/verification/rv/.gitignore | 6 + tools/verification/rv/Build| 1 + tools/verification/rv/Makefile | 207 +-- tools/verification/rv/Makefile.config | 47 ++ tools/verification/rv/Makefile.rv | 51 ++ tools/verification/rv/src/Build| 4 + 17 files changed, 549 insertions(+), 297 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/src/Build create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build -- 2.44.0
[PATCH 3/3] tools/rtla: Use tools/build makefiles to build rtla
Use tools/build/ makefiles to build rtla, inheriting the benefits of it. For example, having a proper way to handle dependencies. rtla is built using perf infra-structure when building inside the kernel tree. At this point, rtla diverges from perf in two points: Documentation and tarball generation/build. At the documentation level, rtla is one step ahead, placing the documentation at Documentation/tools/rtla/, using the same build tools as kernel documentation. The idea is to move perf documentation to the same scheme and then share the same makefiles. rtla has a tarball target that the (old) RHEL8 uses. The tarball was kept using a simple standalone makefile for compatibility. The standalone makefile shares most of the code, e.g., flags, with regular buildings. The tarball method was set as deprecated. If necessary, we can make a rtla tarball like perf, which includes the entire tools/build. But this would also require changes in the user side (the directory structure changes, and probably the deps to build the package). Inspired on perf and objtool. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/.gitignore | 4 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 177 + tools/tracing/rtla/Makefile.config | 44 ++ tools/tracing/rtla/Makefile.rtla | 77 +++ tools/tracing/rtla/Makefile.standalone | 24 tools/tracing/rtla/src/Build | 11 ++ 7 files changed, 193 insertions(+), 145 deletions(-) create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/src/Build diff --git a/tools/tracing/rtla/.gitignore b/tools/tracing/rtla/.gitignore index e9df32419b2b..9ec8a142e157 100644 --- a/tools/tracing/rtla/.gitignore +++ b/tools/tracing/rtla/.gitignore @@ -1 +1,3 @@ -/rtla +rtla +rtla-static +FEATURE-DUMP diff --git a/tools/tracing/rtla/Build b/tools/tracing/rtla/Build new file mode 100644 index ..6c9d5b36a315 --- /dev/null +++ b/tools/tracing/rtla/Build @@ -0,0 +1 @@ +rtla-y += src/ diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile index afd18c678ff5..1064b02bd735 100644 --- a/tools/tracing/rtla/Makefile +++ b/tools/tracing/rtla/Makefile @@ -1,157 +1,46 @@ -NAME := rtla -# Follow the kernel version -VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion | grep -v make) +# SPDX-License-Identifier: GPL-2.0 +export srctree := $(abspath ../../..) -# From libtracefs: -# Makefiles suck: This macro sets a default value of $(2) for the -# variable named by $(1), unless the variable has been set by -# environment or command line. This is necessary for CC and AR -# because make sets default values, so the simpler ?= approach -# won't work as expected. -define allow-override - $(if $(or $(findstring environment,$(origin $(1))),\ -$(findstring command line,$(origin $(1,,\ -$(eval $(1) = $(2))) -endef +VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | grep -v make") +DOCSRC := ../../../Documentation/tools/rtla/ -# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix. -$(call allow-override,CC,$(CROSS_COMPILE)gcc) -$(call allow-override,AR,$(CROSS_COMPILE)ar) -$(call allow-override,STRIP,$(CROSS_COMPILE)strip) -$(call allow-override,PKG_CONFIG,pkg-config) -$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/) -$(call allow-override,LDCONFIG,ldconfig) +FEATURE_TESTS := libtraceevent +FEATURE_TESTS += libtracefs +FEATURE_DISPLAY:= libtraceevent +FEATURE_DISPLAY+= libtracefs -INSTALL= install -MKDIR = mkdir -FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ - -fasynchronous-unwind-tables -fstack-clash-protection -WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized - -ifeq ($(CC),clang) - FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) - WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) -endif - -TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) - -CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) -LIBS := $$($(PKG_CONFIG) --libs libtracefs) - -SRC:= $(wildcard src/*.c) -HDR:= $(wildcard src/*.h) -OBJ:= $(SRC:.c=.o) -DIRS := src -FILES := Makefile README.txt -CEXT := bz2 -TARBALL:= $(NAME)-$(VERSION).tar.$(CEXT) -TAROPTS:= -cvjf $(TARBALL) -BINDIR := /usr/bin -DATADIR:= /usr/share -DOCDIR := $(DATADIR)/doc -MANDIR
[PATCH 2/3] tools/verification: Use tools/build makefiles on rv
Use tools/build/ makefiles to build rv, inheriting the benefits of it. For example, having a proper way to handle dependencies. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/verification/rv/.gitignore | 2 + tools/verification/rv/Build | 1 + tools/verification/rv/Makefile| 166 +- tools/verification/rv/Makefile.config | 44 +++ tools/verification/rv/Makefile.rv | 49 tools/verification/rv/src/Build | 4 + 6 files changed, 133 insertions(+), 133 deletions(-) create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build diff --git a/tools/verification/rv/.gitignore b/tools/verification/rv/.gitignore new file mode 100644 index ..a446aca35eb2 --- /dev/null +++ b/tools/verification/rv/.gitignore @@ -0,0 +1,2 @@ +rv +FEATURE-DUMP diff --git a/tools/verification/rv/Build b/tools/verification/rv/Build new file mode 100644 index ..a44c22349d4b --- /dev/null +++ b/tools/verification/rv/Build @@ -0,0 +1 @@ +rv-y += src/ diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile index 485f8aeddbe0..da16b7c4db0f 100644 --- a/tools/verification/rv/Makefile +++ b/tools/verification/rv/Makefile @@ -1,146 +1,46 @@ -NAME := rv -# Follow the kernel version -VERSION := $(shell cat VERSION 2> /dev/null || make -sC ../../.. kernelversion | grep -v make) +# SPDX-License-Identifier: GPL-2.0 +export srctree := $(abspath ../../..) -# From libtracefs: -# Makefiles suck: This macro sets a default value of $(2) for the -# variable named by $(1), unless the variable has been set by -# environment or command line. This is necessary for CC and AR -# because make sets default values, so the simpler ?= approach -# won't work as expected. -define allow-override - $(if $(or $(findstring environment,$(origin $(1))),\ -$(findstring command line,$(origin $(1,,\ -$(eval $(1) = $(2))) -endef +VERSION:= $(shell sh -c "make -sC ../../.. kernelversion | grep -v make") +DOCSRC := ../../../Documentation/tools/rv/ -# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix. -$(call allow-override,CC,$(CROSS_COMPILE)gcc) -$(call allow-override,AR,$(CROSS_COMPILE)ar) -$(call allow-override,STRIP,$(CROSS_COMPILE)strip) -$(call allow-override,PKG_CONFIG,pkg-config) -$(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/) -$(call allow-override,LDCONFIG,ldconfig) +FEATURE_TESTS := libtraceevent +FEATURE_TESTS += libtracefs +FEATURE_DISPLAY:= libtraceevent +FEATURE_DISPLAY+= libtracefs -INSTALL= install -MKDIR = mkdir -FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ - -fasynchronous-unwind-tables -fstack-clash-protection -WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized - -ifeq ($(CC),clang) - FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) - WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) -endif - -TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) - -CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include -LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) -LIBS := $$($(PKG_CONFIG) --libs libtracefs) - -SRC:= $(wildcard src/*.c) -HDR:= $(wildcard src/*.h) -OBJ:= $(SRC:.c=.o) -DIRS := src -FILES := Makefile README.txt -CEXT := bz2 -TARBALL:= $(NAME)-$(VERSION).tar.$(CEXT) -TAROPTS:= -cvjf $(TARBALL) -BINDIR := /usr/bin -DATADIR:= /usr/share -DOCDIR := $(DATADIR)/doc -MANDIR := $(DATADIR)/man -LICDIR := $(DATADIR)/licenses -SRCTREE:= $(or $(BUILD_SRC),$(CURDIR)) - -# If running from the tarball, man pages are stored in the Documentation -# dir. If running from the kernel source, man pages are stored in -# Documentation/tools/rv/. -ifneq ($(wildcard Documentation/.*),) -DOCSRC = Documentation/ +ifeq ($(V),1) + Q = else -DOCSRC = $(SRCTREE)/../../../Documentation/tools/rv/ -endif - -LIBTRACEEVENT_MIN_VERSION = 1.5 -LIBTRACEFS_MIN_VERSION = 1.3 - -.PHONY:all warnings show_warnings -all: warnings rv - -TEST_LIBTRACEEVENT = $(shell sh -c "$(PKG_CONFIG) --atleast-version $(LIBTRACEEVENT_MIN_VERSION) libtraceevent > /dev/null 2>&1 || echo n") -ifeq ("$(TEST_LIBTRACEEVENT)", "n") -WARNINGS = show_warnings -MISSING_LIBS += echo "** libtraceevent version $(LIBTRACEEVENT_MIN_VERSION) or higher"; -MISSING_PACKAGES += "libtraceevent-devel" -MISSING_
[PATCH 1/3] tools/tracing: Use tools/build makefiles on latency-collector
Use tools/build/ makefiles to build latency-collector, inheriting the benefits of it. For example, having a proper way to handle dependencies. Inspired on perf and objtool. Suggested-by: Linus Torvalds Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/latency/.gitignore | 1 + tools/tracing/latency/Build | 1 + tools/tracing/latency/Makefile| 48 +-- tools/tracing/latency/Makefile.config | 28 4 files changed, 61 insertions(+), 17 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config diff --git a/tools/tracing/latency/.gitignore b/tools/tracing/latency/.gitignore index 0863960761e7..11490d9da0b3 100644 --- a/tools/tracing/latency/.gitignore +++ b/tools/tracing/latency/.gitignore @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 latency-collector +FEATURE-DUMP diff --git a/tools/tracing/latency/Build b/tools/tracing/latency/Build new file mode 100644 index ..0ce65ea72bf9 --- /dev/null +++ b/tools/tracing/latency/Build @@ -0,0 +1 @@ +latency-collector-y += latency-collector.o diff --git a/tools/tracing/latency/Makefile b/tools/tracing/latency/Makefile index 40c4ddaf8be1..3a2b5e8f5997 100644 --- a/tools/tracing/latency/Makefile +++ b/tools/tracing/latency/Makefile @@ -1,24 +1,38 @@ # SPDX-License-Identifier: GPL-2.0 -# Makefile for vm tools -# -VAR_CFLAGS := $(shell pkg-config --cflags libtracefs 2>/dev/null) -VAR_LDLIBS := $(shell pkg-config --libs libtracefs 2>/dev/null) +export srctree := $(abspath ../../..) +export CC := gcc +export LD := ld +export AR := ar +export PKG_CONFIG := pkg-config -TARGETS = latency-collector -CFLAGS = -Wall -Wextra -g -O2 $(VAR_CFLAGS) -LDFLAGS = -lpthread $(VAR_LDLIBS) +FEATURE_TESTS := libtraceevent +FEATURE_TESTS += libtracefs +FEATURE_DISPLAY := libtraceevent +FEATURE_DISPLAY += libtracefs -all: $(TARGETS) +latency-collector: -%: %.c - $(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) +include $(srctree)/tools/build/Makefile.include +include $(srctree)/tools/build/Makefile.feature +include $(srctree)/tools/scripts/Makefile.include +include Makefile.config -clean: - $(RM) latency-collector +CFLAGS += $(INCLUDES) $(LIB_INCLUDES) + +export CFLAGS := $(CFLAGS) + +latency-collector: latency-collector-in.o + $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $^ $(EXTLIBS) -prefix ?= /usr/local -sbindir ?= ${prefix}/sbin +latency-collector.%: fixdep FORCE + make -f $(srctree)/tools/build/Makefile.build dir=. $@ + +latency-collector-in.o: fixdep FORCE + make $(build)=latency-collector + +clean: + $(call QUIET_CLEAN, latency-collector) + @find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete + @rm -f latency-collector FEATURE-DUMP -install: all - install -d $(DESTDIR)$(sbindir) - install -m 755 -p $(TARGETS) $(DESTDIR)$(sbindir) +.PHONY: FORCE clean diff --git a/tools/tracing/latency/Makefile.config b/tools/tracing/latency/Makefile.config new file mode 100644 index ..5100d89346ab --- /dev/null +++ b/tools/tracing/latency/Makefile.config @@ -0,0 +1,28 @@ +STOP_ERROR := + +define lib_setup + $(eval EXTLIBS += -l$(1)) + $(eval LIB_INCLUDES += $(shell sh -c "$(PKG_CONFIG) --cflags lib$(1)")) +endef + +$(call feature_check,libtraceevent) +ifeq ($(feature-libtraceevent), 1) + $(call detected,CONFIG_LIBTRACEEVENT) + $(call lib_setup,traceevent) +else + STOP_ERROR := 1 + $(info libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel) +endif + +$(call feature_check,libtracefs) +ifeq ($(feature-libtracefs), 1) + $(call detected,CONFIG_LIBTRACEFS) + $(call lib_setup,tracefs) +else + STOP_ERROR := 1 + $(info libtracefs is missing. Please install libtracefs-dev/libtracefs-devel) +endif + +ifeq ($(STOP_ERROR),1) + $(error Please, check the errors above.) +endif -- 2.44.0
[PATCH 0/3] tools/tracing: Use tools/build makefiles like perf
tools/tracing and tools/verification/rv are using standalone Makefiles. However, This approach has some drawbacks. For example, code duplication and lack of proper dependency handling, making things harder for users. Linus suggested using perf's build system, and it is indeed the best way to go. This series replaces tools/tracing and tools/verification/rv makefiles with makefiles inspired by perf and objtool that use tools/build/ infrastructure. Thanks, Arnaldo, for the pointers via chat. Link: https://lore.kernel.org/lkml/CAHk-=wjq9bjkbpi3sjn2dy5jvwpo03u9aoc6-g8anlcgq-e...@mail.gmail.com/ Daniel Bristot de Oliveira (3): tools/tracing: Use tools/build makefiles on latency-collector tools/verification: Use tools/build makefiles on rv tools/rtla: Use tools/build makefiles to build rtla tools/tracing/latency/.gitignore | 1 + tools/tracing/latency/Build| 1 + tools/tracing/latency/Makefile | 48 --- tools/tracing/latency/Makefile.config | 28 tools/tracing/rtla/.gitignore | 4 +- tools/tracing/rtla/Build | 1 + tools/tracing/rtla/Makefile| 177 + tools/tracing/rtla/Makefile.config | 44 ++ tools/tracing/rtla/Makefile.rtla | 77 +++ tools/tracing/rtla/Makefile.standalone | 24 tools/tracing/rtla/src/Build | 11 ++ tools/verification/rv/.gitignore | 2 + tools/verification/rv/Build| 1 + tools/verification/rv/Makefile | 166 +-- tools/verification/rv/Makefile.config | 44 ++ tools/verification/rv/Makefile.rv | 49 +++ tools/verification/rv/src/Build| 4 + 17 files changed, 387 insertions(+), 295 deletions(-) create mode 100644 tools/tracing/latency/Build create mode 100644 tools/tracing/latency/Makefile.config create mode 100644 tools/tracing/rtla/Build create mode 100644 tools/tracing/rtla/Makefile.config create mode 100644 tools/tracing/rtla/Makefile.rtla create mode 100644 tools/tracing/rtla/Makefile.standalone create mode 100644 tools/tracing/rtla/src/Build create mode 100644 tools/verification/rv/.gitignore create mode 100644 tools/verification/rv/Build create mode 100644 tools/verification/rv/Makefile.config create mode 100644 tools/verification/rv/Makefile.rv create mode 100644 tools/verification/rv/src/Build -- 2.44.0
[GIT PULL] tracing/tools: Fixes for 6.8-rc4
Steven, Tracing tooling updates for 6.8-rc4 RTLA: - rtla tools are exiting with a positive value when usage() is called. Make them return 0 if the usage was called via -h/--help. - the -P priority sets the sched priority for rtla workload. When the SCHED_OTHER scheduler is selected, it sets the rt_priority instead of the nice parameter. Setting the nice value is the correct thing, so fix it. - rtla is failing to compile with clang due to unsupported options from gcc. Adjusting the compiler/linker options makes clang work properly. - Remove the sched_getattr() unused function on utils.c. - Fixes on variable initialization and size, reported by clang. Verification: - rv is failing to compile with clang due to unsupported options from gcc. Adjusting the compiler/linker options makes clang work properly. - Fix an uninitialized variable on in_kernel.c reported by clang. Please pull the latest trace-tools-fixes-v6.8-rc4 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git trace-tools-fixes-v6.8-rc4 Tag SHA1: 63f0522db203c7d138595e760b9a237bc02990fa Head SHA1: b5f319360371087d52070d8f3fc7789e80ce69a6 Daniel Bristot de Oliveira (6): tools/rtla: Fix Makefile compiler options for clang tools/rtla: Fix uninitialized bucket/data->bucket_size warning tools/rtla: Fix clang warning about mount_point var size tools/rtla: Remove unused sched_getattr() function tools/rv: Fix Makefile compiler options for clang tools/rv: Fix curr_reactor uninitialized variable John Kacur (1): tools/rtla: Exit with EXIT_SUCCESS when help is invoked limingming3 (1): tools/rtla: Replace setting prio with nice for SCHED_OTHER tools/tracing/rtla/Makefile| 7 ++- tools/tracing/rtla/src/osnoise_hist.c | 9 ++--- tools/tracing/rtla/src/osnoise_top.c | 6 +- tools/tracing/rtla/src/timerlat_hist.c | 9 ++--- tools/tracing/rtla/src/timerlat_top.c | 6 +- tools/tracing/rtla/src/utils.c | 14 -- tools/tracing/rtla/src/utils.h | 2 ++ tools/verification/rv/Makefile | 7 ++- tools/verification/rv/src/in_kernel.c | 2 +- 9 files changed, 41 insertions(+), 21 deletions(-) --- diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile index 2456a399eb9a..afd18c678ff5 100644 --- a/tools/tracing/rtla/Makefile +++ b/tools/tracing/rtla/Makefile @@ -28,10 +28,15 @@ FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ -fasynchronous-unwind-tables -fstack-clash-protection WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized +ifeq ($(CC),clang) + FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) + WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +endif + TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -LDFLAGS:= -ggdb $(EXTRA_LDFLAGS) +LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) LIBS := $$($(PKG_CONFIG) --libs libtracefs) SRC:= $(wildcard src/*.c) diff --git a/tools/tracing/rtla/src/osnoise_hist.c b/tools/tracing/rtla/src/osnoise_hist.c index 8f81fa007364..01870d50942a 100644 --- a/tools/tracing/rtla/src/osnoise_hist.c +++ b/tools/tracing/rtla/src/osnoise_hist.c @@ -135,8 +135,7 @@ static void osnoise_hist_update_multiple(struct osnoise_tool *tool, int cpu, if (params->output_divisor) duration = duration / params->output_divisor; - if (data->bucket_size) - bucket = duration / data->bucket_size; + bucket = duration / data->bucket_size; total_duration = duration * count; @@ -480,7 +479,11 @@ static void osnoise_hist_usage(char *usage) for (i = 0; msg[i]; i++) fprintf(stderr, "%s\n", msg[i]); - exit(1); + + if (usage) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); } /* diff --git a/tools/tracing/rtla/src/osnoise_top.c b/tools/tracing/rtla/src/osnoise_top.c index f7c959be8677..457360db0767 100644 --- a/tools/tracing/rtla/src/osnoise_top.c +++ b/tools/tracing/rtla/src/osnoise_top.c @@ -331,7 +331,11 @@ static void osnoise_top_usage(struct osnoise_top_params *params, char *usage) for (i = 0; msg[i]; i++) fprintf(stderr, "%s\n", msg[i]); - exit(1); + + if (usage) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); } /* diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c index 47d3d8b53cb2..
Re: [PATCH] tools/rtla: Replace setting prio with nice for SCHED_OTHER
On 2/7/24 07:51, limingming3 wrote: > Since the sched_priority for SCHED_OTHER is always 0, it makes no > sence to set it. > Setting nice for SCHED_OTHER seems more meaningful. Thanks! This is actually a fix, I meant to set nice since the beginning. -- Daniel
Re: [PATCH 1/6] tools/rtla: Fix Makefile compiler options for clang
On 2/6/24 16:48, Nathan Chancellor wrote: > On Tue, Feb 06, 2024 at 12:05:29PM +0100, Daniel Bristot de Oliveira wrote: >> The following errors are showing up when compiling rtla with clang: >> >> $ make HOSTCC=clang CC=clang LLVM_IAS=1 >> [...] >> >> clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects >> -fexceptions -fstack-protector-strong >> -fasynchronous-unwind-tables -fstack-clash-protection -Wall >> -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 >> -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized >> $(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c >> >> clang: warning: optimization flag '-ffat-lto-objects' is not supported >> [-Wignored-optimization-argument] > > For what it's worth, this flag is supported in clang 17.0.0 and newer: > > https://github.com/llvm/llvm-project/commit/610fc5cbcc8b68879c562f6458608afe2473ab7f Good! still, I am getting this error on fedora, with this clang version: bristot@x1:~/src/git/linux/tools/tracing/rtla$ clang --version clang version 17.0.6 (Fedora 17.0.6-1.fc39) Target: x86_64-redhat-linux-gnu Thread model: posix InstalledDir: /usr/bin :-( > But if it is not critical, just dropping the flag like you have done > here rather than conditionally supporting it is probably easier. Yeah, I will drop it for now, and keep monitoring. Thanks Natan! -- Daniel
[PATCH 6/6] tools/rv: Fix curr_reactor uninitialized variable
clang is reporting: $ make HOSTCC=clang CC=clang LLVM_IAS=1 clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -I include -c -o src/in_kernel.o src/in_kernel.c [...] src/in_kernel.c:227:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 227 | if (!end) | ^~~~ src/in_kernel.c:242:9: note: uninitialized use occurs here 242 | return curr_reactor; |^~~~ src/in_kernel.c:227:2: note: remove the 'if' if its condition is always false 227 | if (!end) | ^ 228 | goto out_free; | ~ src/in_kernel.c:221:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 221 | if (!start) | ^~ src/in_kernel.c:242:9: note: uninitialized use occurs here 242 | return curr_reactor; |^~~~ src/in_kernel.c:221:2: note: remove the 'if' if its condition is always false 221 | if (!start) | ^~~ 222 | goto out_free; | ~ src/in_kernel.c:215:20: note: initialize the variable 'curr_reactor' to silence this warning 215 | char *curr_reactor; | ^ |= NULL 2 warnings generated. Which is correct. Setting curr_reactor to NULL avoids the problem. Cc: sta...@vger.kernel.org Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface") Signed-off-by: Daniel Bristot de Oliveira --- tools/verification/rv/src/in_kernel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/verification/rv/src/in_kernel.c b/tools/verification/rv/src/in_kernel.c index ad28582bcf2b..f04479ecc96c 100644 --- a/tools/verification/rv/src/in_kernel.c +++ b/tools/verification/rv/src/in_kernel.c @@ -210,9 +210,9 @@ static char *ikm_read_reactor(char *monitor_name) static char *ikm_get_current_reactor(char *monitor_name) { char *reactors = ikm_read_reactor(monitor_name); + char *curr_reactor = NULL; char *start; char *end; - char *curr_reactor; if (!reactors) return NULL; -- 2.43.0
[PATCH 5/6] tools/rv: Fix Makefile compiler options for clang
The following errors are showing up when compiling rv with clang: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs) -I include -c -o src/utils.o src/utils.c clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument] warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option] 1 warning generated. clang -o rv -ggdb src/in_kernel.o src/rv.o src/trace.o src/utils.o $(pkg-config --libs libtracefs) src/in_kernel.o: file not recognized: file format not recognized clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:110: rv] Error 1 Solve these issues by: - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang - informing the linker about -flto=auto Cc: sta...@vger.kernel.org Fixes: 4bc4b131d44c ("rv: Add rv tool") Suggested-by: Donald Zickus Signed-off-by: Daniel Bristot de Oliveira --- tools/verification/rv/Makefile | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile index 3d0f3888a58c..485f8aeddbe0 100644 --- a/tools/verification/rv/Makefile +++ b/tools/verification/rv/Makefile @@ -28,10 +28,15 @@ FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ -fasynchronous-unwind-tables -fstack-clash-protection WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized +ifeq ($(CC),clang) + FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) + WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +endif + TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include -LDFLAGS:= -ggdb $(EXTRA_LDFLAGS) +LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) LIBS := $$($(PKG_CONFIG) --libs libtracefs) SRC:= $(wildcard src/*.c) -- 2.43.0
[PATCH 4/6] tools/rtla: Remove unused sched_getattr() function
Clang is reporting: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c src/utils.c:241:19: warning: unused function 'sched_getattr' [-Wunused-function] 241 | static inline int sched_getattr(pid_t pid, struct sched_attr *attr, | ^ 1 warning generated. Which is correct, so remove the unused function. Cc: sta...@vger.kernel.org Fixes: b1696371d865 ("rtla: Helper functions for rtla") Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/src/utils.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c index b998b24edf1e..5fcd6495ff05 100644 --- a/tools/tracing/rtla/src/utils.c +++ b/tools/tracing/rtla/src/utils.c @@ -238,12 +238,6 @@ static inline int sched_setattr(pid_t pid, const struct sched_attr *attr, return syscall(__NR_sched_setattr, pid, attr, flags); } -static inline int sched_getattr(pid_t pid, struct sched_attr *attr, - unsigned int size, unsigned int flags) -{ - return syscall(__NR_sched_getattr, pid, attr, size, flags); -} - int __set_sched_attr(int pid, struct sched_attr *attr) { int flags = 0; -- 2.43.0
[PATCH 3/6] tools/rtla: Fix clang warning about mount_point var size
clang is reporting this warning: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c src/utils.c:548:66: warning: 'fscanf' may overflow; destination buffer in argument 3 has size 1024, but the corresponding specifier may require size 1025 [-Wfortify-source] 548 | while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", mount_point, type) == 2) { | ^ Increase mount_point variable size to MAX_PATH+1 to avoid the overflow. Cc: sta...@vger.kernel.org Fixes: a957cbc02531 ("rtla: Add -C cgroup support") Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/src/utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c index c769d7b3842c..b998b24edf1e 100644 --- a/tools/tracing/rtla/src/utils.c +++ b/tools/tracing/rtla/src/utils.c @@ -536,7 +536,7 @@ int set_cpu_dma_latency(int32_t latency) */ static const int find_mount(const char *fs, char *mp, int sizeof_mp) { - char mount_point[MAX_PATH]; + char mount_point[MAX_PATH+1]; char type[100]; int found = 0; FILE *fp; -- 2.43.0
[PATCH 2/6] tools/rtla: Fix uninitialized bucket/data->bucket_size warning
When compiling rtla with clang, I am getting the following warnings: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [..] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/osnoise_hist.o src/osnoise_hist.c src/osnoise_hist.c:138:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 138 | if (data->bucket_size) | ^ src/osnoise_hist.c:149:6: note: uninitialized use occurs here 149 | if (bucket < entries) | ^~ src/osnoise_hist.c:138:2: note: remove the 'if' if its condition is always true 138 | if (data->bucket_size) | ^~ 139 | bucket = duration / data->bucket_size; src/osnoise_hist.c:132:12: note: initialize the variable 'bucket' to silence this warning 132 | int bucket; | ^ |= 0 1 warning generated. [...] clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/timerlat_hist.o src/timerlat_hist.c src/timerlat_hist.c:181:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 181 | if (data->bucket_size) | ^ src/timerlat_hist.c:204:6: note: uninitialized use occurs here 204 | if (bucket < entries) | ^~ src/timerlat_hist.c:181:2: note: remove the 'if' if its condition is always true 181 | if (data->bucket_size) | ^~ 182 | bucket = latency / data->bucket_size; src/timerlat_hist.c:175:12: note: initialize the variable 'bucket' to silence this warning 175 | int bucket; | ^ |= 0 1 warning generated. This is a legit warning, but data->bucket_size is always > 0 (see timerlat_hist_parse_args()), so the if is not necessary. Remove the unneeded if (data->bucket_size) to avoid the warning. Cc: sta...@vger.kernel.org Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode") Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode") Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/src/osnoise_hist.c | 3 +-- tools/tracing/rtla/src/timerlat_hist.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/tracing/rtla/src/osnoise_hist.c b/tools/tracing/rtla/src/osnoise_hist.c index 8f81fa007364..67128171c29d 100644 --- a/tools/tracing/rtla/src/osnoise_hist.c +++ b/tools/tracing/rtla/src/osnoise_hist.c @@ -135,8 +135,7 @@ static void osnoise_hist_update_multiple(struct osnoise_tool *tool, int cpu, if (params->output_divisor) duration = duration / params->output_divisor; - if (data->bucket_size) - bucket = duration / data->bucket_size; + bucket = duration / data->bucket_size; total_duration = duration * count; diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c index 47d3d8b53cb2..3a5b8c409e7d 100644 --- a/tools/tracing/rtla/src/timerlat_hist.c +++ b/tools/tracing/rtla/src/timerlat_hist.c @@ -178,8 +178,7 @@ timerlat_hist_update(struct osnoise_tool *tool, int cpu, if (params->output_divisor) latency = latency / params->output_divisor; - if (data->bucket_size) - bucket = latency / data->bucket_size; + bucket = latency / data->bucket_size; if (!context) { hist = data->hist[cpu].irq; -- 2.43.0
[PATCH 1/6] tools/rtla: Fix Makefile compiler options for clang
The following errors are showing up when compiling rtla with clang: $ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs)-c -o src/utils.o src/utils.c clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument] warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option] 1 warning generated. clang -o rtla -ggdb src/osnoise.o src/osnoise_hist.o src/osnoise_top.o src/rtla.o src/timerlat_aa.o src/timerlat.o src/timerlat_hist.o src/timerlat_top.o src/timerlat_u.o src/trace.o src/utils.o $(pkg-config --libs libtracefs) src/osnoise.o: file not recognized: file format not recognized clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:110: rtla] Error 1 Solve these issues by: - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang - informing the linker about -flto=auto Cc: sta...@vger.kernel.org Fixes: 1a7b22ab15eb ("tools/rtla: Build with EXTRA_{C,LD}FLAGS") Suggested-by: Donald Zickus Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/Makefile | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile index 2456a399eb9a..afd18c678ff5 100644 --- a/tools/tracing/rtla/Makefile +++ b/tools/tracing/rtla/Makefile @@ -28,10 +28,15 @@ FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ -fasynchronous-unwind-tables -fstack-clash-protection WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized +ifeq ($(CC),clang) + FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) + WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +endif + TRACEFS_HEADERS:= $$($(PKG_CONFIG) --cflags libtracefs) CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -LDFLAGS:= -ggdb $(EXTRA_LDFLAGS) +LDFLAGS:= -flto=auto -ggdb $(EXTRA_LDFLAGS) LIBS := $$($(PKG_CONFIG) --libs libtracefs) SRC:= $(wildcard src/*.c) -- 2.43.0
[PATCH 0/6] tools: Fix rtla and rv problems (found) with clang
RHEL people reported some errors when compiling rtla and rv with clang. The command line used to compile the tools is: $ make HOSTCC=clang CC=clang LLVM_IAS=1 The first problem is two unsupported flags passed to the compiler: -ffat-lto-objects and -Wno-maybe-uninitialized. They will be removed if the compile is clang. Also, the clang linker does not automatically recognize the -flto=auto option used at compilation time, so it is explicitly set. With the compiler working, it starts pointing to some warnings and errors about uninitialized variables, variable size, and an unused function. These problems are also fixed. Daniel Bristot de Oliveira (6): tools/rtla: Fix Makefile compiler options for clang tools/rtla: Fix uninitialized bucket/data->bucket_size warning tools/rtla: Fix clang warning about mount_point var size tools/rtla: Remove unused sched_getattr() function tools/rv: Fix Makefile compiler options for clang tools/rv: Fix curr_reactor uninitialized variable tools/tracing/rtla/Makefile| 7 ++- tools/tracing/rtla/src/osnoise_hist.c | 3 +-- tools/tracing/rtla/src/timerlat_hist.c | 3 +-- tools/tracing/rtla/src/utils.c | 8 +--- tools/verification/rv/Makefile | 7 ++- tools/verification/rv/src/in_kernel.c | 2 +- 6 files changed, 16 insertions(+), 14 deletions(-) -- 2.43.0
Re: [PATCH] tracing/timerlat: Move hrtimer_init to timerlat_fd open()
On 2/1/24 16:44, Greg KH wrote: > On Thu, Feb 01, 2024 at 04:13:39PM +0100, Daniel Bristot de Oliveira wrote: >> Currently, the timerlat's hrtimer is initialized at the first read of >> timerlat_fd, and destroyed at close(). It works, but it causes an error >> if the user program open() and close() the file without reading. > > What error exactly happens? Userspace, or the kernel crashes? sorry, kernel crash: # echo NO_OSNOISE_WORKLOAD > /sys/kernel/debug/tracing/osnoise/options # echo timerlat > /sys/kernel/debug/tracing/current_tracer # cat ./timerlat_load.py #!/usr/bin/env python3 timerlat_fd = open("/sys/kernel/tracing/osnoise/per_cpu/cpu0/timerlat_fd", 'r') timerlat_fd.close(); # ./taskset -c 0 ./timerlat_load.py [ 6401.611374] BUG: kernel NULL pointer dereference, address: 0010 [ 6401.611786] #PF: supervisor read access in kernel mode [ 6401.612081] #PF: error_code(0x) - not-present page [ 6401.612376] PGD 0 P4D 0 [ 6401.612495] Oops: [#1] PREEMPT SMP NOPTI [ 6401.612690] CPU: 1 PID: 2673 Comm: python3 Not tainted 6.6.13-200.fc39.x86_64 #1 [ 6401.613011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 [ 6401.613379] RIP: 0010:hrtimer_active+0xd/0x50 [ 6401.613577] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 57 30 <8b> 42 10 a8 01 74 09 f3 90 8b 42 10 a8 01 75 f7 80 7f 38 00 75 1d [ 6401.614374] RSP: 0018:b031009b7e10 EFLAGS: 00010286 [ 6401.614604] RAX: 0002db00 RBX: 9118f786db08 RCX: [ 6401.614914] RDX: RSI: 9117a0e64400 RDI: 9118f786db08 [ 6401.615225] RBP: 9118f786db80 R08: 9117a0ddd420 R09: 9117804d4f70 [ 6401.615534] R10: R11: R12: 9118f786db08 [ 6401.615846] R13: 91178fdd5e20 R14: 9117840978c0 R15: [ 6401.616157] FS: 7f2ffbab1740() GS:9118f784() knlGS: [ 6401.616508] CS: 0010 DS: ES: CR0: 80050033 [ 6401.616765] CR2: 0010 CR3: 0001b402e000 CR4: 00750ee0 [ 6401.617075] PKRU: 5554 [ 6401.617197] Call Trace: [ 6401.617309] [ 6401.617407] ? __die+0x23/0x70 [ 6401.617548] ? page_fault_oops+0x171/0x4e0 [ 6401.617983] ? srso_alias_return_thunk+0x5/0x7f [ 6401.618389] ? avc_has_extended_perms+0x237/0x520 [ 6401.618800] ? exc_page_fault+0x7f/0x180 [ 6401.619176] ? asm_exc_page_fault+0x26/0x30 [ 6401.619563] ? hrtimer_active+0xd/0x50 [ 6401.619926] hrtimer_cancel+0x15/0x40 [ 6401.620286] timerlat_fd_release+0x48/0xe0 [ 6401.620666] __fput+0xf5/0x290 [ 6401.621004] __x64_sys_close+0x3d/0x80 [ 6401.621370] do_syscall_64+0x60/0x90 [ 6401.621730] ? srso_alias_return_thunk+0x5/0x7f [ 6401.622129] ? __x64_sys_ioctl+0x72/0xd0 [ 6401.622503] ? srso_alias_return_thunk+0x5/0x7f [ 6401.622902] ? syscall_exit_to_user_mode+0x2b/0x40 [ 6401.623309] ? srso_alias_return_thunk+0x5/0x7f [ 6401.623703] ? do_syscall_64+0x6c/0x90 [ 6401.624063] ? srso_alias_return_thunk+0x5/0x7f [ 6401.624457] ? exit_to_user_mode_prepare+0x142/0x1f0 [ 6401.624868] ? srso_alias_return_thunk+0x5/0x7f [ 6401.625262] ? syscall_exit_to_user_mode+0x2b/0x40 [ 6401.625663] ? srso_alias_return_thunk+0x5/0x7f [ 6401.626051] ? do_syscall_64+0x6c/0x90 [ 6401.626404] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 6401.626810] RIP: 0033:0x7f2ffb321594 [ 6401.627164] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 cd 0d 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d [ 6401.628345] RSP: 002b:7ffe8d8eef18 EFLAGS: 0202 ORIG_RAX: 0003 [ 6401.628867] RAX: ffda RBX: 7f2ffba4e668 RCX: 7f2ffb321594 [ 6401.629372] RDX: RSI: RDI: 0003 [ 6401.629879] RBP: 7ffe8d8eef40 R08: R09: [ 6401.630384] R10: 55c926e3167eae79 R11: 0202 R12: 0003 [ 6401.630889] R13: 7ffe8d8ef030 R14: R15: 7f2ffba4e668 [ 6401.631394] [ 6401.631691] Modules linked in: tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc pktcdvd intel_rapl_msr snd_hda_codec_generic intel_rapl_common ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm kvm_amd iTCO_wdt snd_timer intel_pmc_bxt ccp joydev iTCO_vendor_support kvm irqbypass i2c_i801 snd i2c_smbus soundcore lpc_ich virtio_balloon loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_net virtio_gpu virtio_console vi
Re: [PATCH] tracing/timerlat: Move hrtimer_init to timerlat_fd open()
On 2/1/24 16:25, Steven Rostedt wrote: > On Thu, 1 Feb 2024 16:13:39 +0100 > Daniel Bristot de Oliveira wrote: > >> Currently, the timerlat's hrtimer is initialized at the first read of >> timerlat_fd, and destroyed at close(). It works, but it causes an error >> if the user program open() and close() the file without reading. >> >> Move hrtimer_init to timerlat_fd open() to avoid this problem. >> >> No functional changes. > > It can't be fixing something and not have any functional changes. > > No functional changes means the code is restructured but the resulting > assembly would be the same. > > Like moving functions around in a file so that you don't need extra > prototype declarations. > > Please only add "No functional changes" if the function's assembly would be > the same. ok >> >> Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface") > > With a fixes tag, I'm assuming his should go into v6.8 with a Cc stable? right, I added it on Cc, but did not include the Cc:.. tag. It seems that I should have. -- Daniel > -- Steve > > >> Signed-off-by: Daniel Bristot de Oliveira >> --- >> kernel/trace/trace_osnoise.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c >> index bd0d01d00fb9..a8e28f9b9271 100644 >> --- a/kernel/trace/trace_osnoise.c >> +++ b/kernel/trace/trace_osnoise.c >> @@ -2444,6 +2444,9 @@ static int timerlat_fd_open(struct inode *inode, >> struct file *file) >> tlat = this_cpu_tmr_var(); >> tlat->count = 0; >> >> +hrtimer_init(>timer, CLOCK_MONOTONIC, >> HRTIMER_MODE_ABS_PINNED_HARD); >> +tlat->timer.function = timerlat_irq; >> + >> migrate_enable(); >> return 0; >> }; >> @@ -2526,9 +2529,6 @@ timerlat_fd_read(struct file *file, char __user *ubuf, >> size_t count, >> tlat->tracing_thread = false; >> tlat->kthread = current; >> >> -hrtimer_init(>timer, CLOCK_MONOTONIC, >> HRTIMER_MODE_ABS_PINNED_HARD); >> -tlat->timer.function = timerlat_irq; >> - >> /* Annotate now to drift new period */ >> tlat->abs_period = hrtimer_cb_get_time(>timer); >> >
[PATCH] tracing/timerlat: Move hrtimer_init to timerlat_fd open()
Currently, the timerlat's hrtimer is initialized at the first read of timerlat_fd, and destroyed at close(). It works, but it causes an error if the user program open() and close() the file without reading. Move hrtimer_init to timerlat_fd open() to avoid this problem. No functional changes. Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface") Signed-off-by: Daniel Bristot de Oliveira --- kernel/trace/trace_osnoise.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index bd0d01d00fb9..a8e28f9b9271 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2444,6 +2444,9 @@ static int timerlat_fd_open(struct inode *inode, struct file *file) tlat = this_cpu_tmr_var(); tlat->count = 0; + hrtimer_init(>timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD); + tlat->timer.function = timerlat_irq; + migrate_enable(); return 0; }; @@ -2526,9 +2529,6 @@ timerlat_fd_read(struct file *file, char __user *ubuf, size_t count, tlat->tracing_thread = false; tlat->kthread = current; - hrtimer_init(>timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD); - tlat->timer.function = timerlat_irq; - /* Annotate now to drift new period */ tlat->abs_period = hrtimer_cb_get_time(>timer); -- 2.43.0
[GIT PULL] tracing/tools: Updates for 6.7
Steven, Tracing tools changes for 6.7: RTLA: - On rtla/utils.c, initialize the 'found' variable to avoid garbage when a mount point is not found. Verification: - Remove duplicated imports on dot2k python script Please pull the latest tracing-tools-v6.7 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git tracing-tools-v6.7 Tag SHA1: eadb7e76bc7ce4174d4905a29a4a07cd835508fe Head SHA1: 696444a544ecd6d62c1edc89516b376cefb28929 Alessandro Carminati (Red Hat) (1): verification/dot2k: Delete duplicate imports Colin Ian King (1): rtla: Fix uninitialized variable found tools/tracing/rtla/src/utils.c | 2 +- tools/verification/dot2/dot2k | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) --- diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c index 623a38908ed5..c769d7b3842c 100644 --- a/tools/tracing/rtla/src/utils.c +++ b/tools/tracing/rtla/src/utils.c @@ -538,7 +538,7 @@ static const int find_mount(const char *fs, char *mp, int sizeof_mp) { char mount_point[MAX_PATH]; char type[100]; - int found; + int found = 0; FILE *fp; fp = fopen("/proc/mounts", "r"); diff --git a/tools/verification/dot2/dot2k b/tools/verification/dot2/dot2k index 9dcd38abe20a..d4d7e52d549e 100644 --- a/tools/verification/dot2/dot2k +++ b/tools/verification/dot2/dot2k @@ -15,8 +15,6 @@ if __name__ == '__main__': import os import platform import sys -import sys -import argparse parser = argparse.ArgumentParser(description='transform .dot file into kernel rv monitor') parser.add_argument('-d', "--dot", dest="dot_file", required=True)
Re: [PATCH v5] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms
>> In any case, as you've pointed out, duplicates can arise from names in code >> that is not intended to be a module. >> Therefore, relying solely on the module name would not fully address the >> problem you initially aimed to solve. > > From my POV: > > The source path and the line number is enough to distinguish duplicate > symbols even in modules. > > The added module name would just add extra complexity into the kernel > and tools parsing and using the alias. The tracing tools would need to > handle the source path and line number anyway for symbols duplicated > within same module/vmlinux. > > Adding module name for builtin modules might be misleading. It won't > be clear which symbols are in vmlinux binary and which are in > real modules. +1 -- Daniel
Re: [PATCH v1] Ftrace: make sched_wakeup can focus on the target process
On 10/9/23 17:37, Jinyu Tang wrote: > $ cyclictest --mlockall --smp --priority=99 & rtla timerlat -a will give you an structured analysis of your latency... https://bristot.me/linux-scheduling-latency-debug-and-analysis/ -- Daniel
Re: [PATCH V3] tracing/timerlat: Hotplug support for the user-space interface
On 10/4/23 03:03, Steven Rostedt wrote: > On Fri, 29 Sep 2023 17:02:46 +0200 > Daniel Bristot de Oliveira wrote: > >> The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible >> CPU, but it might create confusion if the CPU is not online. >> >> Create the file only for online CPUs, also follow hotplug by >> creating and deleting as CPUs come and go. >> >> Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface") > > Is this a fix that needs to go in now and Cc'd to stable? Or is this > something that can wait till the next merge window? We can wait for the next merge window... it is a non-trivial fix. -- Daniel > -- Steve
[GIT PULL] rtla: Fixes for 6.6
Steven, Timerlat auto-analysis: - Timerlat is reporting thread interference time without thread noise events occurrence. It was caused because the thread interference variable was not reset after the analysis of a timerlat activation that did not hit the threshold. - The IRQ handler delay is estimated from the delta of the IRQ latency reported by timerlat, and the timestamp from IRQ handler start event. If the delta is near-zero, the drift from the external clock and the trace event and/or the overhead can cause the value to be negative. If the value is negative, print a zero-delay. - IRQ handlers happening after the timerlat thread event but before the stop tracing were being reported as IRQ that happened before the *current* IRQ occurrence. Ignore Previous IRQ noise in this condition because they are valid only for the *next* timerlat activation. Timerlat user-space: - Timerlat is stopping all user-space thread if a CPU becomes offline. Do not stop the entire tool if a CPU is/become offline, but only the thread of the unavailable CPU. Stop the tool only, if all threads leave because the CPUs become/are offline. man-pages: - Fix command line example in timerlat hist man page. Please pull the latest rtla-v6.6-fixes tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git rtla-v6.6-fixes Tag SHA1: 6a0df51b5c0a075a553a5ad73b1fd421f559eb6b Head SHA1: 81ec384b80ffbda752c230778d39ea620c7e3bcf Daniel Bristot de Oliveira (4): rtla/timerlat_aa: Zero thread sum after every sample analysis rtla/timerlat_aa: Fix negative IRQ delay rtla/timerlat_aa: Fix previous IRQ delay for IRQs that happens after thread sample rtla/timerlat: Do not stop user-space if a cpu is offline Xie XiuQi (1): rtla: fix a example in rtla-timerlat-hist.rst Documentation/tools/rtla/rtla-timerlat-hist.rst | 4 ++-- tools/tracing/rtla/src/timerlat_aa.c| 32 - tools/tracing/rtla/src/timerlat_u.c | 6 +++-- 3 files changed, 32 insertions(+), 10 deletions(-) --- diff --git a/Documentation/tools/rtla/rtla-timerlat-hist.rst b/Documentation/tools/rtla/rtla-timerlat-hist.rst index 057db78d4095..03b7f3deb069 100644 --- a/Documentation/tools/rtla/rtla-timerlat-hist.rst +++ b/Documentation/tools/rtla/rtla-timerlat-hist.rst @@ -36,11 +36,11 @@ EXAMPLE In the example below, **rtla timerlat hist** is set to run for *10* minutes, in the cpus *0-4*, *skipping zero* only lines. Moreover, **rtla timerlat hist** will change the priority of the *timerlat* threads to run under -*SCHED_DEADLINE* priority, with a *10us* runtime every *1ms* period. The +*SCHED_DEADLINE* priority, with a *100us* runtime every *1ms* period. The *1ms* period is also passed to the *timerlat* tracer. Auto-analysis is disabled to reduce overhead :: - [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1ms --no-aa + [root@alien ~]# timerlat hist -d 10m -c 0-4 -P d:100us:1ms -p 1000 --no-aa # RTLA timerlat histogram # Time unit is microseconds (us) # Duration: 0 00:10:00 diff --git a/tools/tracing/rtla/src/timerlat_aa.c b/tools/tracing/rtla/src/timerlat_aa.c index e0ffe69c271c..7093fd5333be 100644 --- a/tools/tracing/rtla/src/timerlat_aa.c +++ b/tools/tracing/rtla/src/timerlat_aa.c @@ -159,6 +159,7 @@ static int timerlat_aa_irq_latency(struct timerlat_aa_data *taa_data, taa_data->thread_nmi_sum = 0; taa_data->thread_irq_sum = 0; taa_data->thread_softirq_sum = 0; + taa_data->thread_thread_sum = 0; taa_data->thread_blocking_duration = 0; taa_data->timer_irq_start_time = 0; taa_data->timer_irq_duration = 0; @@ -337,7 +338,23 @@ static int timerlat_aa_irq_handler(struct trace_seq *s, struct tep_record *recor taa_data->timer_irq_start_time = start; taa_data->timer_irq_duration = duration; - taa_data->timer_irq_start_delay = taa_data->timer_irq_start_time - expected_start; + /* +* We are dealing with two different clock sources: the +* external clock source that timerlat uses as a reference +* and the clock used by the tracer. There are also two +* moments: the time reading the clock and the timer in +* which the event is placed in the buffer (the trace +* event timestamp). If the processor is slow or there +* is some hardware noise, the difference between the +* timestamp and the external clock read can be longer +* than the IRQ handler delay, resulting in a negative +* time. If so, set IRQ start delay as 0. In the end, +* it is less relevant than the noise. +*/ + if (expected_start
[PATCH V3] tracing/timerlat: Hotplug support for the user-space interface
The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible CPU, but it might create confusion if the CPU is not online. Create the file only for online CPUs, also follow hotplug by creating and deleting as CPUs come and go. Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface") Signed-off-by: Daniel Bristot de Oliveira --- Changes from V2: - Better split the code into the generic (per_cpu/cpu$) and timerlat (/timerlat_fd) specific function (Daniel) - Fixed a cpus_read_lock/unlock() usage (kbuild test) Link: https://lore.kernel.org/lkml/6b9a5f306e488bc77bf8521faeade420a0adf3e4.1695224204.git.bris...@kernel.org/ Changes from V1: - Fix compilation issue when !HOTPLUG - Fix init interface | hotplug race Link: https://lore.kernel.org/lkml/b619d9fd08a3bb47018cf40afa95783844a3c1fd.1694789910.git.bris...@kernel.org/ kernel/trace/trace_osnoise.c | 149 --- 1 file changed, 121 insertions(+), 28 deletions(-) diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index bd0d01d00fb9..f375fb1e721d 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -229,6 +229,19 @@ static inline struct osnoise_variables *this_cpu_osn_var(void) } #ifdef CONFIG_TIMERLAT_TRACER + +/* + * osnoise/per_cpu dir + */ +static struct dentry *osnoise_per_cpu_fd; + +struct osnoise_per_cpu_dir { + struct dentry *root; + struct dentry *timerlat_fd; +}; + +static DEFINE_PER_CPU(struct osnoise_per_cpu_dir, osnoise_per_cpu_dir); + /* * Runtime information for the timer mode. */ @@ -2000,6 +2013,9 @@ static int start_kthread(unsigned int cpu) char comm[24]; if (timerlat_enabled()) { + if (!test_bit(OSN_WORKLOAD, _options)) + return 0; + snprintf(comm, 24, "timerlat/%d", cpu); main = timerlat_main; } else { @@ -2065,19 +2081,99 @@ static int start_per_cpu_kthreads(void) return retval; } +#ifdef CONFIG_TIMERLAT_TRACER +static struct dentry *osnoise_get_per_cpu_dir(long cpu) +{ + struct dentry *cpu_dir_entry = per_cpu_ptr(_per_cpu_dir, cpu)->root; + char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */ + + if (cpu_dir_entry) + return cpu_dir_entry; + + if (!osnoise_per_cpu_fd) + return NULL; + + snprintf(cpu_str, 30, "cpu%ld", cpu); + cpu_dir_entry = tracefs_create_dir(cpu_str, osnoise_per_cpu_fd); + if (!cpu_dir_entry) + return NULL; + + per_cpu_ptr(_per_cpu_dir, cpu)->root = cpu_dir_entry; + + return cpu_dir_entry; +} + +static void osnoise_put_per_cpu_dir(long cpu) +{ + struct dentry *cpu_dir_entry = per_cpu_ptr(_per_cpu_dir, cpu)->root; + + if (!cpu_dir_entry) + return; + + tracefs_remove(cpu_dir_entry); + + per_cpu_ptr(_per_cpu_dir, cpu)->root = NULL; +} + +static const struct file_operations timerlat_fd_fops; + +static int timerlat_add_per_cpu_interface(long cpu) +{ + struct dentry *cpu_dir_entry = osnoise_get_per_cpu_dir(cpu); + struct dentry *timerlat_fd; + + lockdep_assert_cpus_held(); + + if (!cpu_dir_entry) + return -ENOMEM; + + timerlat_fd = trace_create_file("timerlat_fd", TRACE_MODE_READ, cpu_dir_entry, NULL, + _fd_fops); + if (!timerlat_fd) + goto out_put; + + per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = timerlat_fd; + + /* Record the CPU */ + d_inode(timerlat_fd)->i_cdev = (void *)(cpu); + + return 0; + +out_put: + osnoise_put_per_cpu_dir(cpu); + return -ENOMEM; +} + +static void timerlat_rm_per_cpu_interface(long cpu) +{ + struct dentry *timerlat_fd = per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd; + + if (timerlat_fd) { + tracefs_remove(timerlat_fd); + per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = NULL; + } + + osnoise_put_per_cpu_dir(cpu); +} +#elif defined(CONFIG_HOTPLUG_CPU) +static int timerlat_add_per_cpu_interface(long cpu) { return 0; }; +static void timerlat_rm_per_cpu_interface(long cpu) {}; +#endif + #ifdef CONFIG_HOTPLUG_CPU static void osnoise_hotplug_workfn(struct work_struct *dummy) { unsigned int cpu = smp_processor_id(); mutex_lock(_types_lock); - - if (!osnoise_has_registered_instances()) - goto out_unlock_trace; - mutex_lock(_lock); cpus_read_lock(); + timerlat_add_per_cpu_interface(cpu); + + if (!osnoise_has_registered_instances()) + goto out_unlock; + if (!cpumask_test_cpu(cpu, _cpumask)) goto out_unlock; @@ -2086,7 +2182,6 @@ static void osnoise_hotplug_workfn(struct work_struct *dummy) out_unlock: cpus_read_unlock(); mutex_un
[PATCH V2] tracing/timerlat: Hotplug support for the user-space interface
The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible CPU, but it might create confusion if the CPU is not online. Create the file only for online CPUs, also follow hotplug by creating and deleting as CPUs come and go. Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface") Signed-off-by: Daniel Bristot de Oliveira --- Changes from V1: - Fix compilation issue when !HOTPLUG - Fix init interface | hotplug race Link: https://lore.kernel.org/lkml/b619d9fd08a3bb47018cf40afa95783844a3c1fd.1694789910.git.bris...@kernel.org/ kernel/trace/trace_osnoise.c | 112 +++ 1 file changed, 86 insertions(+), 26 deletions(-) diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index bd0d01d00fb9..8422562d9864 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -229,6 +229,19 @@ static inline struct osnoise_variables *this_cpu_osn_var(void) } #ifdef CONFIG_TIMERLAT_TRACER + +/* + * osnoise/per_cpu dir + */ +static struct dentry *osnoise_per_cpu_fd; + +struct osnoise_per_cpu_dir { + struct dentry *root; + struct dentry *timerlat_fd; +}; + +static DEFINE_PER_CPU(struct osnoise_per_cpu_dir, osnoise_per_cpu_dir); + /* * Runtime information for the timer mode. */ @@ -2000,6 +2013,9 @@ static int start_kthread(unsigned int cpu) char comm[24]; if (timerlat_enabled()) { + if (!test_bit(OSN_WORKLOAD, _options)) + return 0; + snprintf(comm, 24, "timerlat/%d", cpu); main = timerlat_main; } else { @@ -2065,19 +2081,64 @@ static int start_per_cpu_kthreads(void) return retval; } +#ifdef CONFIG_TIMERLAT_TRACER +static const struct file_operations timerlat_fd_fops; +static int timerlat_add_per_cpu_interface(long cpu) +{ + struct dentry *timerlat_fd, *cpu_dir_fd; + char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */ + + if (!osnoise_per_cpu_fd) + return 0; + + snprintf(cpu_str, 30, "cpu%ld", cpu); + cpu_dir_fd = tracefs_create_dir(cpu_str, osnoise_per_cpu_fd); + + if (cpu_dir_fd) { + timerlat_fd = trace_create_file("timerlat_fd", TRACE_MODE_READ, + cpu_dir_fd, NULL, _fd_fops); + WARN_ON_ONCE(!timerlat_fd); + per_cpu_ptr(_per_cpu_dir, cpu)->root = cpu_dir_fd; + per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = timerlat_fd; + + /* Record the CPU */ + d_inode(timerlat_fd)->i_cdev = (void *)(cpu); + + return 0; + } + + return -ENOMEM; +} + +static void timerlat_rm_per_cpu_interface(long cpu) +{ + struct dentry *cpu_dir = per_cpu_ptr(_per_cpu_dir, cpu)->root; + + if (cpu_dir) { + tracefs_remove(cpu_dir); + per_cpu_ptr(_per_cpu_dir, cpu)->root = NULL; + per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = NULL; + } +} +#elif defined(CONFIG_HOTPLUG_CPU) +static int timerlat_add_per_cpu_interface(long cpu) { return 0; }; +static void timerlat_rm_per_cpu_interface(long cpu) {}; +#endif + #ifdef CONFIG_HOTPLUG_CPU static void osnoise_hotplug_workfn(struct work_struct *dummy) { unsigned int cpu = smp_processor_id(); mutex_lock(_types_lock); - - if (!osnoise_has_registered_instances()) - goto out_unlock_trace; - mutex_lock(_lock); cpus_read_lock(); + timerlat_add_per_cpu_interface(cpu); + + if (!osnoise_has_registered_instances()) + goto out_unlock; + if (!cpumask_test_cpu(cpu, _cpumask)) goto out_unlock; @@ -2086,7 +2147,6 @@ static void osnoise_hotplug_workfn(struct work_struct *dummy) out_unlock: cpus_read_unlock(); mutex_unlock(_lock); -out_unlock_trace: mutex_unlock(_types_lock); } @@ -2106,6 +2166,7 @@ static int osnoise_cpu_init(unsigned int cpu) */ static int osnoise_cpu_die(unsigned int cpu) { + timerlat_rm_per_cpu_interface(cpu); stop_kthread(cpu); return 0; } @@ -2708,10 +2769,7 @@ static int init_timerlat_stack_tracefs(struct dentry *top_dir) static int osnoise_create_cpu_timerlat_fd(struct dentry *top_dir) { - struct dentry *timerlat_fd; - struct dentry *per_cpu; - struct dentry *cpu_dir; - char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */ + int retval; long cpu; /* @@ -2720,29 +2778,24 @@ static int osnoise_create_cpu_timerlat_fd(struct dentry *top_dir) * Because osnoise/timerlat have a single workload, having * multiple files like these are wast of memory. */ - per_cpu = tracefs_create_dir("per_cpu", top_dir); - if (!per_cpu) + osnoise_per_cpu_fd = tra
Re: [PATCH] tracing/timerlat: Hotplug support for the user-space interface
On 9/16/23 02:21, kernel test robot wrote: > Hi Daniel, > > kernel test robot noticed the following build warnings: > > [auto build test WARNING on linus/master] > [also build test WARNING on v6.6-rc1 next-20230915] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: > https://github.com/intel-lab-lkp/linux/commits/Daniel-Bristot-de-Oliveira/tracing-timerlat-Hotplug-support-for-the-user-space-interface/20230915-230157 > base: linus/master > patch link: > https://lore.kernel.org/r/b619d9fd08a3bb47018cf40afa95783844a3c1fd.1694789910.git.bristot%40kernel.org > patch subject: [PATCH] tracing/timerlat: Hotplug support for the user-space > interface > config: um-randconfig-002-20230916 > (https://download.01.org/0day-ci/archive/20230916/202309160854.saw0rium-...@intel.com/config) > ^^ The config has no timerlat and no hotplug... compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 > reproduce (this is a W=1 build): > (https://download.01.org/0day-ci/archive/20230916/202309160854.saw0rium-...@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version > of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot > | Closes: > https://lore.kernel.org/oe-kbuild-all/202309160854.saw0rium-...@intel.com/ > > All warnings (new ones prefixed by >>): > >>> kernel/trace/trace_osnoise.c:2125:13: warning: >>> 'timerlat_rm_per_cpu_interface' defined but not used [-Wunused-function] > 2125 | static void timerlat_rm_per_cpu_interface(long cpu) {}; > | ^ >>> kernel/trace/trace_osnoise.c:2124:12: warning: >>> 'timerlat_add_per_cpu_interface' defined but not used [-Wunused-function] > 2124 | static int timerlat_add_per_cpu_interface(long cpu) { return 0; }; > |^~ These functions are called when hotplug but no timerlat when no hotplug: defined but not used. > > vim +/timerlat_rm_per_cpu_interface +2125 kernel/trace/trace_osnoise.c > > 2112 > 2113static void timerlat_rm_per_cpu_interface(long cpu) > 2114{ > 2115struct dentry *cpu_dir = > per_cpu_ptr(_per_cpu_dir, cpu)->root; > 2116 > 2117if (cpu_dir) { > 2118tracefs_remove(cpu_dir); > 2119per_cpu_ptr(_per_cpu_dir, cpu)->root = > NULL; > 2120per_cpu_ptr(_per_cpu_dir, > cpu)->timerlat_fd = NULL; > 2121} > 2122} > 2123#else >> 2124 static int timerlat_add_per_cpu_interface(long cpu) { return 0; }; >> 2125 static void timerlat_rm_per_cpu_interface(long cpu) {}; > 2126#endif > 2127 > Fixing it. -- Daniel
[PATCH] tracing/timerlat: Hotplug support for the user-space interface
The osnoise/per_cpu/CPU$/timerlat_fd is create for each possible CPU, but it might create confusion if the CPU is not online. Create the file only for online CPUs, also follow hotplug by creating and deleting as CPUs come and go. Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface") Signed-off-by: Daniel Bristot de Oliveira --- kernel/trace/trace_osnoise.c | 101 ++- 1 file changed, 77 insertions(+), 24 deletions(-) diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index bd0d01d00fb9..1af01eec3e36 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -229,6 +229,19 @@ static inline struct osnoise_variables *this_cpu_osn_var(void) } #ifdef CONFIG_TIMERLAT_TRACER + +/* + * osnoise/per_cpu dir + */ +static struct dentry *osnoise_per_cpu_fd; + +struct osnoise_per_cpu_dir { + struct dentry *root; + struct dentry *timerlat_fd; +}; + +static DEFINE_PER_CPU(struct osnoise_per_cpu_dir, osnoise_per_cpu_dir); + /* * Runtime information for the timer mode. */ @@ -2000,6 +2013,9 @@ static int start_kthread(unsigned int cpu) char comm[24]; if (timerlat_enabled()) { + if (!test_bit(OSN_WORKLOAD, _options)) + return 0; + snprintf(comm, 24, "timerlat/%d", cpu); main = timerlat_main; } else { @@ -2065,19 +2081,64 @@ static int start_per_cpu_kthreads(void) return retval; } +#ifdef CONFIG_TIMERLAT_TRACER +static const struct file_operations timerlat_fd_fops; +static int timerlat_add_per_cpu_interface(long cpu) +{ + struct dentry *timerlat_fd, *cpu_dir_fd; + char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */ + + if (!osnoise_per_cpu_fd) + return 0; + + snprintf(cpu_str, 30, "cpu%ld", cpu); + cpu_dir_fd = tracefs_create_dir(cpu_str, osnoise_per_cpu_fd); + + if (cpu_dir_fd) { + timerlat_fd = trace_create_file("timerlat_fd", TRACE_MODE_READ, + cpu_dir_fd, NULL, _fd_fops); + WARN_ON_ONCE(!timerlat_fd); + per_cpu_ptr(_per_cpu_dir, cpu)->root = cpu_dir_fd; + per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = timerlat_fd; + + /* Record the CPU */ + d_inode(timerlat_fd)->i_cdev = (void *)(cpu); + + return 0; + } + + return -ENOMEM; +} + +static void timerlat_rm_per_cpu_interface(long cpu) +{ + struct dentry *cpu_dir = per_cpu_ptr(_per_cpu_dir, cpu)->root; + + if (cpu_dir) { + tracefs_remove(cpu_dir); + per_cpu_ptr(_per_cpu_dir, cpu)->root = NULL; + per_cpu_ptr(_per_cpu_dir, cpu)->timerlat_fd = NULL; + } +} +#else +static int timerlat_add_per_cpu_interface(long cpu) { return 0; }; +static void timerlat_rm_per_cpu_interface(long cpu) {}; +#endif + #ifdef CONFIG_HOTPLUG_CPU static void osnoise_hotplug_workfn(struct work_struct *dummy) { unsigned int cpu = smp_processor_id(); mutex_lock(_types_lock); - - if (!osnoise_has_registered_instances()) - goto out_unlock_trace; - mutex_lock(_lock); cpus_read_lock(); + timerlat_add_per_cpu_interface(cpu); + + if (!osnoise_has_registered_instances()) + goto out_unlock; + if (!cpumask_test_cpu(cpu, _cpumask)) goto out_unlock; @@ -2086,7 +2147,6 @@ static void osnoise_hotplug_workfn(struct work_struct *dummy) out_unlock: cpus_read_unlock(); mutex_unlock(_lock); -out_unlock_trace: mutex_unlock(_types_lock); } @@ -2106,6 +2166,7 @@ static int osnoise_cpu_init(unsigned int cpu) */ static int osnoise_cpu_die(unsigned int cpu) { + timerlat_rm_per_cpu_interface(cpu); stop_kthread(cpu); return 0; } @@ -2708,10 +2769,7 @@ static int init_timerlat_stack_tracefs(struct dentry *top_dir) static int osnoise_create_cpu_timerlat_fd(struct dentry *top_dir) { - struct dentry *timerlat_fd; - struct dentry *per_cpu; - struct dentry *cpu_dir; - char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */ + int retval; long cpu; /* @@ -2720,29 +2778,24 @@ static int osnoise_create_cpu_timerlat_fd(struct dentry *top_dir) * Because osnoise/timerlat have a single workload, having * multiple files like these are wast of memory. */ - per_cpu = tracefs_create_dir("per_cpu", top_dir); - if (!per_cpu) + osnoise_per_cpu_fd = tracefs_create_dir("per_cpu", top_dir); + if (!osnoise_per_cpu_fd) return -ENOMEM; - for_each_possible_cpu(cpu) { - snprintf(cpu_str, 30, "cpu%ld", cpu); - cpu_dir = tracefs_cre
[PATCH] tools/rtla: Do not stop user-space if a cpu is offline
If no CPU list is passed, timerlat in user-space will dispatch one thread per sysconf(_SC_NPROCESSORS_CONF). However, not all CPU might be available, for instance, if HT is disabled. Currently, rtla timerlat is stopping the session if an user-space thread cannot set affinity to a CPU, or if a running user-space thread is killed. However, this is too restrictive. So, reduce the error to a debug message, and rtla timerlat run as long as there is at least one user-space thread alive. Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support") Signed-off-by: Daniel Bristot de Oliveira --- tools/tracing/rtla/src/timerlat_u.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/tracing/rtla/src/timerlat_u.c b/tools/tracing/rtla/src/timerlat_u.c index 05e310696dd5..01dbf9a6b5a5 100644 --- a/tools/tracing/rtla/src/timerlat_u.c +++ b/tools/tracing/rtla/src/timerlat_u.c @@ -45,7 +45,7 @@ static int timerlat_u_main(int cpu, struct timerlat_u_params *params) retval = sched_setaffinity(gettid(), sizeof(set), ); if (retval == -1) { - err_msg("Error setting user thread affinity\n"); + debug_msg("Error setting user thread affinity %d, is the CPU online?\n", cpu); exit(1); } @@ -193,7 +193,9 @@ void *timerlat_u_dispatcher(void *data) procs_count--; } } - break; + + if (!procs_count) + break; } sleep(1); -- 2.38.1
Re: [PATCH v2] verification/dot2k: Delete duplicate imports
On 9/6/23 17:57, Alessandro Carminati (Red Hat) wrote: > The presence of duplicate import lines appears to be a typo. > Removing them. > > Fixes: 24bce201d798 ("tools/rv: Add dot2k") > Signed-off-by: Alessandro Carminati (Red Hat) Queued, thanks! https://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux.git/commit/?h=tools/verification=5a9587fea055163026b6d22d593fc64ed04de3a6 -- Daniel
Re: [PATCH] kernel:irq:manage: request threaded irq with a specified priority
On 4/16/21 6:57 AM, chensong wrote: > > > On 2021/4/13 下午4:39, Thomas Gleixner wrote: >> On Tue, Apr 13 2021 at 14:19, Song Chen wrote: >>> In general, irq handler thread will be assigned a default priority which >>> is MAX_RT_PRIO/2, as a result, no one can preempt others. >>> >>> Here is the case I found in a real project, an interrupt int_a is >>> coming, wakes up its handler handler_a and handler_a wakes up a >>> userspace RT process task_a. >>> >>> However, if another irq handler handler_b which has nothing to do >>> with any RT tasks is running when int_a is coming, handler_a can't >>> preempt handler_b, as a result, task_a can't be waken up immediately >>> as expected until handler_b gives up cpu voluntarily. In this case, >>> determinism breaks. >> >> It breaks because the system designer failed to assign proper priorities >> to the irq threads int_a, int_b and to the user space process task_a. > > yes, it's designers' responsibility to assign proper priorities, but > kernel is also obliged to provide interfaces for those designers. There is no optimal priority assignment for fixed priority schedulers without a prior knowledge of all tasks (and their timing behavior, e.g., exec time, activation pattern...). So, the developer will never know what is the best priority. Such fine tune should be done by the user. > > chrt can help designers in this case, however, the truth is lot of customers > are > not familiar with it. And making this change in C in kernel is just turning it even more complex. what's more, chrt can also apply to userspace rt task, but > userspace also has sched_setscheduler to assgin proper priority inside code > like > cyclictest, why can't driver writers have another choice? The developer of task_a can also use sched_setscheduler() to adjust the priority of the handler_a - or even better, decrease the priority of the handler_b as it is not that important. The developer is supposed to know how to change priority because task_a is RT too. Note that the user sets the priority on cyclictest (-p). -- Daniel
Re: [RFC PATCH 1/5] tracing/hwlat: Add a cpus file specific for hwlat_detector
On 4/15/21 3:49 PM, Steven Rostedt wrote: > OK, for osnoise, I can see how it is useful. But as you said above, for > hwlat tracer, it's not as useful. I agree, it is not as useful. -- Daniel
Re: [RFC PATCH 5/5] tracing: Add the osnoise tracer
On 4/14/21 7:14 PM, Steven Rostedt wrote: > On Thu, 8 Apr 2021 16:13:23 +0200 > Daniel Bristot de Oliveira wrote: > >> In the context of high-performance computing (HPC), the Operating System >> Noise (osnoise) refers to the interference experienced by an application >> due to activities inside the operating system. In the context of Linux, >> NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the >> system. Moreover, hardware-related jobs can also cause noise, for example, >> via SMIs. >> >> hwlat_detector is one of the tools used to identify the most complex >> source of noise: hardware noise. >> >> In a nutshell, the hwlat_detector creates a thread that runs >> periodically for a given period. At the beginning of a period, the thread >> disables interrupt and starts sampling. While running, the hwlatd >> thread reads the time in a loop. As interrupts are disabled, threads, >> IRQs, and SoftIRQs cannot interfere with the hwlatd thread. Hence, the >> cause of any gap between two different reads of the time roots either on >> NMI or in the hardware itself. At the end of the period, hwlatd enables >> interrupts and reports the max observed gap between the reads. It also >> prints an NMI occurrence counter. If the output does not report NMI >> executions, the user can conclude that the hardware is the culprit for >> the latency. The hwlat detects the NMI execution by observing >> the entry and exit of an NMI. >> >> The osnoise tracer leverages the hwlat_detector by running a >> similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing >> all the sources of osnoise during its execution. Using the same approach >> of hwlat, osnoise takes note of the entry and exit point of any >> source of interferences, increasing a per-cpu interference counter. The >> osnoise tracer also saves an interference counter for each source of >> interference. The interference counter for NMI, IRQs, SoftIRQs, and >> threads is increased anytime the tool observes these interferences' entry >> events. When a noise happens without any interference from the operating >> system level, the hardware noise counter increases, pointing to a >> hardware-related noise. In this way, osnoise can account for any >> source of interference. At the end of the period, the osnoise tracer >> prints the sum of all noise, the max single noise, the percentage of CPU >> available for the thread, and the counters for the noise sources. >> >> Usage >> >> Write the ASCII text osnoise into the current_tracer file of the >> tracing system (generally mounted at /sys/kernel/tracing or >> /sys/kernel/debug/tracing). >> >> For example:: >> >> [root@f32 ~]# cd /sys/kernel/tracing/ >> [root@f32 tracing]# echo osnoise > current_tracer >> >> It is possible to follow the trace by reading the trace trace file:: >> >> [root@f32 tracing]# cat trace >> # tracer: osnoise >> # >> #_-=> irqs-off >> # / _=> need-resched >> # | / _---=> hardirq/softirq >> # || / _--=> preempt-depth >> MAX >> # || / >>SINGLE Interference counters: >> # RUNTIME NOISE >> % OF CPU NOISE+-+ >> # TASK-PID CPU# TIMESTAMPIN US IN US >> AVAILABLE IN US HWNMIIRQ SIRQ THREAD >> # | | | | | | >>|| | | | | | >><...>-859 [000] 81.637220: 100190 >> 99.98100 9 18 0 1007 18 1 >><...>-860 [001] 81.638154: 100656 >> 99.93440 74 23 0 1006 16 3 >><...>-861 [002] 81.638193: 100 5675 >> 99.43250 202 6 0 1013 25 21 >><...>-862 [003] 81.638242: 100125 >> 99.98750 45 1 0 1011 23 0 >><...>-863 [004] 81.638260: 100 1721 >> 99.82790 168 7 0 1002 49 41 >>
Re: [RFC PATCH 3/5] tracing/hwlat: Implement the per-cpu mode
On 4/14/21 4:41 PM, Steven Rostedt wrote: > On Thu, 8 Apr 2021 16:13:21 +0200 > Daniel Bristot de Oliveira wrote: > >> Implements the per-cpu mode in which a sampling thread is created for >> each cpu in the "cpus" (and tracing_mask). >> >> The per-cpu mode has the potention to speed up the hwlat detection by >> running on multiple CPUs at the same time. > > And totally slow down the entire system in the process ;-) Too :-) But this is not the default config... So it should be an intentional change. >> >> Cc: Jonathan Corbet >> Cc: Steven Rostedt >> Cc: Ingo Molnar >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: Alexandre Chartre >> Cc: Clark Willaims >> Cc: John Kacur >> Cc: Juri Lelli >> Cc: linux-...@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> Signed-off-by: Daniel Bristot de Oliveira >> >> --- >> Documentation/trace/hwlat_detector.rst | 6 +- >> kernel/trace/trace_hwlat.c | 171 +++-- >> 2 files changed, 137 insertions(+), 40 deletions(-) >> >> diff --git a/Documentation/trace/hwlat_detector.rst >> b/Documentation/trace/hwlat_detector.rst >> index f63fdd867598..7a6fab105b29 100644 >> --- a/Documentation/trace/hwlat_detector.rst >> +++ b/Documentation/trace/hwlat_detector.rst >> @@ -85,10 +85,12 @@ the available options are: >> >> - none:do not force migration >> - round-robin: migrate across each CPU specified in cpus between each >> window >> + - per-cpu: create a per-cpu thread for each cpu in cpus >> >> By default, hwlat detector will also obey the tracing_cpumask, so the thread >> will be placed only in the set of cpus that is both on the hwlat detector's >> cpus and in the global tracing_cpumask file. The user can overwrite the >> cpumask by setting it manually. Changing the hwlatd affinity externally, >> -e.g., via taskset tool, will disable the round-robin migration. >> - >> +e.g., via taskset tool, will disable the round-robin migration. In the >> +per-cpu mode, the per-cpu thread (hwlatd/CPU) will be pinned to its relative >> +cpu, and its affinity cannot be changed. >> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c >> index 3818200c9e24..52968ea312df 100644 >> --- a/kernel/trace/trace_hwlat.c >> +++ b/kernel/trace/trace_hwlat.c >> @@ -34,7 +34,7 @@ >> * Copyright (C) 2008-2009 Jon Masters, Red Hat, Inc. >> * Copyright (C) 2013-2016 Steven Rostedt, Red Hat, Inc. >> >> * >> - * Includes useful feedback from Clark Williams >> + * Includes useful feedback from Clark Williams > > Interesting update ;-) Should I make it a separated patch? :-) >> * >> */ >> #include >> @@ -54,9 +54,6 @@ static struct trace_array *hwlat_trace; >> #define DEFAULT_SAMPLE_WIDTH50 /* 0.5s */ >> #define DEFAULT_LAT_THRESHOLD 10 /* 10us */ >> >> -/* sampling thread*/ >> -static struct task_struct *hwlat_kthread; >> - >> static struct dentry *hwlat_sample_width; /* sample width us */ >> static struct dentry *hwlat_sample_window; /* sample window us */ >> static struct dentry *hwlat_cpumask_dentry; /* hwlat cpus allowed */ >> @@ -65,19 +62,27 @@ static struct dentry *hwlat_thread_mode; /* hwlat thread >> mode */ >> enum { >> MODE_NONE = 0, >> MODE_ROUND_ROBIN, >> +MODE_PER_CPU, >> MODE_MAX >> }; >> >> -static char *thread_mode_str[] = { "none", "round-robin" }; >> +static char *thread_mode_str[] = { "none", "round-robin", "per-cpu" }; >> >> /* Save the previous tracing_thresh value */ >> static unsigned long save_tracing_thresh; >> >> -/* NMI timestamp counters */ >> -static u64 nmi_ts_start; >> -static u64 nmi_total_ts; >> -static int nmi_count; >> -static int nmi_cpu; >> +/* runtime kthread data */ >> +struct hwlat_kthread_data { >> +struct task_struct *kthread; >> +/* NMI timestamp counters */ >> +u64 nmi_ts_start; >> +u64 nmi_total_ts; >> +int nmi_count; >> +int nmi_cpu; >> +}; >> + >> +struct hwlat_kthread_data hwlat_single_cpu_data; >> +DEFINE_PER_CPU(struct hwlat_kthread_data, hwlat_per_cpu_data); >> >> /* Tells NMIs to call back to the hwlat tracer to record timestamps */ >> bool trace_hwlat_callback_enabled; >> @@ -114,6 +119,14 @@ static struct hwlat_data { &g
Re: [RFC PATCH 2/5] tracing/hwlat: Implement the mode config option
On 4/14/21 4:30 PM, Steven Rostedt wrote: > On Thu, 8 Apr 2021 16:13:20 +0200 > Daniel Bristot de Oliveira wrote: > >> +/** >> + * hwlat_mode_write - Write function for "mode" entry >> + * @filp: The active open file structure >> + * @ubuf: The user buffer that contains the value to write >> + * @cnt: The maximum number of bytes to write to "file" >> + * @ppos: The current position in @file >> + * >> + * This function provides a write implementation for the "mode" interface >> + * to the hardware latency detector. hwlatd has different operation modes. >> + * The "none" sets the allowed cpumask for a single hwlatd thread at the >> + * startup and lets the scheduler handle the migration. The default mode is >> + * the "round-robin" one, in which a single hwlatd thread runs, migrating >> + * among the allowed CPUs in a round-robin fashion. >> + */ >> +static ssize_t hwlat_mode_write(struct file *filp, const char __user *ubuf, >> + size_t cnt, loff_t *ppos) >> +{ >> +const char *mode; >> +char buf[64]; >> +int ret; >> +int i; >> + >> +if (hwlat_busy) >> +return -EBUSY; > > So we can't switch modes while running? > As you mentioned in the patch 3/5, this limitation was added because of the running threads. But, yes, stopping and starting the tracer to re-create the threads should work as well. I will try it for the next round. > Also, with this implemented, you can remove the disable_migrate variable, > and just switch the mode to NONE when it's detected that the affinity mask > of the thread has been changed. > That was my initial intention with the NONE mode, but I feared breaking something by removing the "migrate_disable" logic. If you do not think it is a problem, I will remove the migrate disable and just change the mode. -- Daniel > -- Steve > > >> + >> +if (cnt >= sizeof(buf)) >> +return -EINVAL; >> + >> +if (copy_from_user(buf, ubuf, cnt)) >> +return -EFAULT; >> + >> +buf[cnt] = 0; >> + >> +mode = strstrip(buf); >> + >> +ret = -EINVAL; >> + >> +for (i = 0; i < MODE_MAX; i++) { >> +if (strcmp(mode, thread_mode_str[i]) == 0) { >> +hwlat_data.thread_mode = i; >> +ret = cnt; >> +} >> +} >> + >> +*ppos += cnt; >> + >> +return cnt; >> +} >> + >> + >
Re: [RFC PATCH 1/5] tracing/hwlat: Add a cpus file specific for hwlat_detector
On 4/14/21 4:10 PM, Steven Rostedt wrote: > On Thu, 8 Apr 2021 16:13:19 +0200 > Daniel Bristot de Oliveira wrote: > >> Provides a "cpus" interface to the hardware latency detector. By >> default, it lists all CPUs, allowing hwlatd threads to run on any online >> CPU of the system. >> >> It serves to restrict the execution of hwlatd to the set of CPUs writing >> via this interface. Note that hwlatd also respects the "tracing_cpumask." >> Hence, hwlatd threads will run only on the set of CPUs allowed here AND >> on "tracing_cpumask." >> >> Why not keep just "tracing_cpumask"? Because the user might be interested >> in tracing what is running on other CPUs. For instance, one might run >> hwlatd in one HT CPU while observing what is running on the sibling HT >> CPU. The cpu list format is also more intuitive. >> >> Also in preparation to the per-cpu mode. > > OK, I'm still not convinced that you couldn't use tracing_cpumask here. > Because we have instances, and tracing_cpumask is defined per instance, you > could simply do: > > # cd /sys/kernel/tracing > # mkdir instances/hwlat > # echo a > instances/hwlat/tracing_cpumask > # echo hwlat > instances/hwlat/current_tracer > > Now the tracing_cpumask above only affects the hwlat tracer. > > I'm just reluctant to add more tracing files if the current ones can be > used without too much trouble. For being intuitive, let's make user space > tools hide the nastiness of the kernel interface ;-) [discussing about the cpus file in both hwlat and osnoise here...] I see your point, but by having two different instances give you two different output "trace" files... and it is not that always practical to merge them when using only the tracefs interface (I like to use it, and it is very handy when dealing with immutable systems, on customers...). Thinking aloud, one might say: sort the two trace files by timestamp... and other might reply: but some lines do not have a timestamp associated, e.g., the stacktrace. Anyway, the cpus file on hwlat is not a super essential thing, I agree... interrupts are disabled, so not much could go wrong (although I really needed the trace from a sibling cpu in a real case). But for the osnoise tracer the cpus file is really useful. For instance, on a system with the CPU 7 isolated: - %< - # echo 7 > osnoise/cpus # echo target_cpu == 7 > events/sched/sched_wakeup/filter # echo stacktrace if target_cpu == 7 > events/sched/sched_wakeup/trigger # echo 1 > events/sched/sched_wakeup/enable # echo osnoise:thread_noise > set_event # echo osnoise > current_tracer # cat trace [find...] kworker/0:1-7 [000] d..5 1820.717780: => trace_event_raw_event_sched_wakeup_template => __traceiter_sched_wakeup => ttwu_do_wakeup => try_to_wake_up => __queue_work => queue_delayed_work_on => vmstat_shepherd => process_one_work => worker_thread => kthread => ret_from_fork kworker/7:1-410 [007] d..3 1820.717790: thread_noise: kworker/7:1:410 start 1820.717786519 duration 3626 ns osnoise/7-1000[007] 1821.582340: 100 90 99.99100 15 1 0 12 6 1 - >% - It was possible to easily find that the '1' thread noise was a kworker, dispatched from CPU 0, and that it was dispatched by "vmstat_shepherd". Also, the osnoise dir is not added to a new instance... so, it only costs "one" file... > -- Steve >
Re: [RFC PATCH 5/5] tracing: Add the osnoise tracer
On 4/8/21 5:58 PM, Jonathan Corbet wrote: > Daniel Bristot de Oliveira writes: > > A quick nit: > >> Documentation/trace/osnoise_tracer.rst | 149 ++ >> include/linux/ftrace_irq.h | 16 + >> include/trace/events/osnoise.h | 141 ++ >> kernel/trace/Kconfig | 34 + >> kernel/trace/Makefile |1 + >> kernel/trace/trace.h |9 +- >> kernel/trace/trace_entries.h | 27 + >> kernel/trace/trace_osnoise.c | 1714 >> kernel/trace/trace_output.c| 72 +- >> 9 files changed, 2159 insertions(+), 4 deletions(-) >> create mode 100644 Documentation/trace/osnoise_tracer.rst >> create mode 100644 include/trace/events/osnoise.h >> create mode 100644 kernel/trace/trace_osnoise.c > When you create a new RST file, you need to add it to an index.rst (or > similar) file so that it gets incorporated into the docs build. ack! > > The document itself looks good on a quick read. If you're making > another pass over it, you might consider reducing the ``markup noise`` a > bit; we try to keep that to a minimum in the kernel docs. But otherwise > thanks for writing it! Thanks for the review, Jon. I will reduce the `` markup (on this, and on some other docs that are about to come :-)) -- Daniel > jon >
[RFC PATCH 5/5] tracing: Add the osnoise tracer
tracer during the associated runtime. - The % OF CPU AVAILABLE reports the percentage of CPU available for the osnoise thread during the runtime window. - The MAX SINGLE NOISE IN US reports the maximum single noise observed during the runtime window. - The Interference counters display how many each of the respective interference happened during the runtime window. Note that the example above shows a high number of HW noise samples. The reason being is that this sample was taken on a virtual machine, and the host interference is detected as a hardware interference. Tracer options The tracer has a set of options inside the osnoise directory, they are: - cpus: CPUs at which a osnoise thread will execute. - period_us: the period of the osnoise thread. - runtime_us: how long an osnoise thread will look for noise. - stop_tracing_single_us: stop the system tracing of a single noise higher than the configured value is happens. Writing 0 disables this option. - stop_tracing_total_us: stop the system tracing of a NOISE IN USE higher than the configured value is happens. Writing 0 disables this option. - tolerance_ns: the minimum delta between two time() reads to be considered as noise. Additional Tracing In addition to the tracer, a set of tracepoints were added to facilitate the identification of the osnoise source. - osnoise:sample_threshold: printed anytime a noise is higher than the configurable tolerance_ns. - osnoise:nmi_noise: noise from NMI, including the duration. - osnoise:irq_noise: noise from an IRQ, including the duration. - osnoise:softirq_noise: noise from a SoftIRQ, including the duration. - osnoise:thread_noise: noise from a thread, including the duration. Note that a all the values are net values. This means that a thread duration will not contain the duration of the IRQs that happened during its execution, for example. The same is valid for all duration values. Here is one example of the usage of these tracepoints:: osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns osnoise/8-961 [008] 5789.858413: sample_threshold: start 5789.858404555 duration 8 us interferences 2 In this example, a noise sample of 8 microseconds was reported in the last fine, pointing to two interferences. Looking backward in the trace, the two previous entries were about the migration thread running after a timer IRQ execution. The first event is not part of the noise because it took place one millisecond before. It is worth noticing that the sum of the duration reported in the tracepoints is smaller than eight us reported in the sample_threshold. The reason roots in the tracing overhead and in the overhead of the entry and exit code that happens before and after any interference execution. This justifies the dual approach: measuring thread and tracing. Cc: Jonathan Corbet Cc: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Alexandre Chartre Cc: Clark Willaims Cc: John Kacur Cc: Juri Lelli Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira --- Documentation/trace/osnoise_tracer.rst | 149 ++ include/linux/ftrace_irq.h | 16 + include/trace/events/osnoise.h | 141 ++ kernel/trace/Kconfig | 34 + kernel/trace/Makefile |1 + kernel/trace/trace.h |9 +- kernel/trace/trace_entries.h | 27 + kernel/trace/trace_osnoise.c | 1714 kernel/trace/trace_output.c| 72 +- 9 files changed, 2159 insertions(+), 4 deletions(-) create mode 100644 Documentation/trace/osnoise_tracer.rst create mode 100644 include/trace/events/osnoise.h create mode 100644 kernel/trace/trace_osnoise.c diff --git a/Documentation/trace/osnoise_tracer.rst b/Documentation/trace/osnoise_tracer.rst new file mode 100644 index ..9a97f557317b --- /dev/null +++ b/Documentation/trace/osnoise_tracer.rst @@ -0,0 +1,149 @@ +== +OSNOISE Tracer +== + +In the context of high-performance computing (HPC), the Operating System +Noise (*osnoise*) refers to the interference experienced by an application +due to activities inside the operating system. In the context of Linux, +NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the +system. Moreover, hardware-related jobs can also cause noise, for example, +via SMIs. + +``hwlat_detector`` is one of the tools used to identify the most complex +source of noise: *hardware noise*. + +In a nutshell, the ``hwlat_detector`` creates a thread that runs +periodically for a given period. At the b
[RFC PATCH 4/5] tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers
From: Steven Rostedt To have nanosecond output displayed in a more human readable format, its nicer to convert it to a seconds format (XXX.Y). The problem is that to do so, the numbers must be divided by NSEC_PER_SEC, and moded too. But as these numbers are 64 bit, this can not be done simply with '/' and '%' operators, but must use do_div() instead. Instead of performing the expensive do_div() in the hot path of the tracepoint, it is more efficient to perform it during the output phase. But passing in do_div() can confuse the parser, and do_div() doesn't work exactly like a normal C function. It modifies the number in place, and we don't want to modify the actual values in the ring buffer. Two helper functions are now created: __print_ns_to_secs() and __print_ns_without_secs() They both take a value of nanoseconds, and the former will return that number divided by NSEC_PER_SEC, and the latter will mod it with NSEC_PER_SEC giving a way to print a nice human readable format: __print_fmt("time=%llu.%09u", __print_ns_to_secs(REC->nsec_val), __print_ns_without_secs(REC->nsec_val)) Cc: Jonathan Corbet Cc: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Alexandre Chartre Cc: Clark Willaims Cc: John Kacur Cc: Juri Lelli Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Steven Rostedt Signed-off-by: Daniel Bristot de Oliveira --- include/trace/trace_events.h | 25 + 1 file changed, 25 insertions(+) diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 8268bf747d6f..c60fd1037b91 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -33,6 +33,21 @@ static const char TRACE_SYSTEM_STRING[] = \ __stringify(TRACE_SYSTEM) +#undef __print_ns_to_secs +#define __print_ns_to_secs(value) \ + ({ \ + u64 val = (u64)value; \ + do_div(val, NSEC_PER_SEC); \ + val;\ + }) + +#undef __print_ns_without_secs +#define __print_ns_without_secs(value) \ + ({ \ + u64 val = (u64)value; \ + (u32) do_div(val, NSEC_PER_SEC);\ + }) + TRACE_MAKE_SYSTEM_STR(); #undef TRACE_DEFINE_ENUM @@ -736,6 +751,16 @@ static inline void ftrace_test_probe_##call(void) \ #undef __print_array #undef __print_hex_dump +/* + * The below is not executed in the kernel. It is only what is + * displayed in the print format for userspace to parse. + */ +#undef __print_ns_to_secs +#define __print_ns_to_secs(val) val / 10UL + +#undef __print_ns_without_secs +#define __print_ns_without_secs(val) val % 10UL + #undef TP_printk #define TP_printk(fmt, args...) "\"" fmt "\", " __stringify(args) -- 2.30.2
[RFC PATCH 3/5] tracing/hwlat: Implement the per-cpu mode
Implements the per-cpu mode in which a sampling thread is created for each cpu in the "cpus" (and tracing_mask). The per-cpu mode has the potention to speed up the hwlat detection by running on multiple CPUs at the same time. Cc: Jonathan Corbet Cc: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Alexandre Chartre Cc: Clark Willaims Cc: John Kacur Cc: Juri Lelli Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira --- Documentation/trace/hwlat_detector.rst | 6 +- kernel/trace/trace_hwlat.c | 171 +++-- 2 files changed, 137 insertions(+), 40 deletions(-) diff --git a/Documentation/trace/hwlat_detector.rst b/Documentation/trace/hwlat_detector.rst index f63fdd867598..7a6fab105b29 100644 --- a/Documentation/trace/hwlat_detector.rst +++ b/Documentation/trace/hwlat_detector.rst @@ -85,10 +85,12 @@ the available options are: - none:do not force migration - round-robin: migrate across each CPU specified in cpus between each window + - per-cpu: create a per-cpu thread for each cpu in cpus By default, hwlat detector will also obey the tracing_cpumask, so the thread will be placed only in the set of cpus that is both on the hwlat detector's cpus and in the global tracing_cpumask file. The user can overwrite the cpumask by setting it manually. Changing the hwlatd affinity externally, -e.g., via taskset tool, will disable the round-robin migration. - +e.g., via taskset tool, will disable the round-robin migration. In the +per-cpu mode, the per-cpu thread (hwlatd/CPU) will be pinned to its relative +cpu, and its affinity cannot be changed. diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 3818200c9e24..52968ea312df 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -34,7 +34,7 @@ * Copyright (C) 2008-2009 Jon Masters, Red Hat, Inc. * Copyright (C) 2013-2016 Steven Rostedt, Red Hat, Inc. * - * Includes useful feedback from Clark Williams + * Includes useful feedback from Clark Williams * */ #include @@ -54,9 +54,6 @@ static struct trace_array *hwlat_trace; #define DEFAULT_SAMPLE_WIDTH 50 /* 0.5s */ #define DEFAULT_LAT_THRESHOLD 10 /* 10us */ -/* sampling thread*/ -static struct task_struct *hwlat_kthread; - static struct dentry *hwlat_sample_width; /* sample width us */ static struct dentry *hwlat_sample_window; /* sample window us */ static struct dentry *hwlat_cpumask_dentry;/* hwlat cpus allowed */ @@ -65,19 +62,27 @@ static struct dentry *hwlat_thread_mode;/* hwlat thread mode */ enum { MODE_NONE = 0, MODE_ROUND_ROBIN, + MODE_PER_CPU, MODE_MAX }; -static char *thread_mode_str[] = { "none", "round-robin" }; +static char *thread_mode_str[] = { "none", "round-robin", "per-cpu" }; /* Save the previous tracing_thresh value */ static unsigned long save_tracing_thresh; -/* NMI timestamp counters */ -static u64 nmi_ts_start; -static u64 nmi_total_ts; -static int nmi_count; -static int nmi_cpu; +/* runtime kthread data */ +struct hwlat_kthread_data { + struct task_struct *kthread; + /* NMI timestamp counters */ + u64 nmi_ts_start; + u64 nmi_total_ts; + int nmi_count; + int nmi_cpu; +}; + +struct hwlat_kthread_data hwlat_single_cpu_data; +DEFINE_PER_CPU(struct hwlat_kthread_data, hwlat_per_cpu_data); /* Tells NMIs to call back to the hwlat tracer to record timestamps */ bool trace_hwlat_callback_enabled; @@ -114,6 +119,14 @@ static struct hwlat_data { .thread_mode= MODE_ROUND_ROBIN }; +struct hwlat_kthread_data *get_cpu_data(void) +{ + if (hwlat_data.thread_mode == MODE_PER_CPU) + return this_cpu_ptr(_per_cpu_data); + else + return _single_cpu_data; +} + static bool hwlat_busy; static void trace_hwlat_sample(struct hwlat_sample *sample) @@ -151,7 +164,9 @@ static void trace_hwlat_sample(struct hwlat_sample *sample) void trace_hwlat_callback(bool enter) { - if (smp_processor_id() != nmi_cpu) + struct hwlat_kthread_data *kdata = get_cpu_data(); + + if (kdata->kthread) return; /* @@ -160,13 +175,13 @@ void trace_hwlat_callback(bool enter) */ if (!IS_ENABLED(CONFIG_GENERIC_SCHED_CLOCK)) { if (enter) - nmi_ts_start = time_get(); + kdata->nmi_ts_start = time_get(); else - nmi_total_ts += time_get() - nmi_ts_start; + kdata->nmi_total_ts += time_get() - kdata->nmi_ts_start; } if (enter) - nmi_count++; + kdata->nmi_count++; } /** @@ -178,6 +193,7 @@ void trace_hwlat_callback(bool enter)
[RFC PATCH 2/5] tracing/hwlat: Implement the mode config option
Provides the "mode" config to the hardware latency detector. hwlatd has two different operation modes. The default mode is the "round-robin" one, in which a single hwlatd thread runs, migrating among the allowed CPUs in a "round-robin" fashion. This is the current behavior. The "none" sets the allowed cpumask for a single hwlatd thread at the startup, but skips the round-robin, letting the scheduler handle the migration. In preparation to the per-cpu mode. Cc: Jonathan Corbet Cc: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Alexandre Chartre Cc: Clark Willaims Cc: John Kacur Cc: Juri Lelli Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira --- Documentation/trace/hwlat_detector.rst | 21 +++- kernel/trace/trace_hwlat.c | 157 +++-- 2 files changed, 162 insertions(+), 16 deletions(-) diff --git a/Documentation/trace/hwlat_detector.rst b/Documentation/trace/hwlat_detector.rst index 86f973a7763c..f63fdd867598 100644 --- a/Documentation/trace/hwlat_detector.rst +++ b/Documentation/trace/hwlat_detector.rst @@ -76,10 +76,19 @@ in /sys/kernel/tracing: - hwlat_detector/width - specified amount of time to spin within window (usecs) - hwlat_detector/window- amount of time between (width) runs (usecs) - hwlat_detector/cpus - the CPUs to move the hwlat thread across + - hwlat_detector/mode - the thread mode + +By default, the hwlat detector's kernel thread will migrate across each CPU +specified in cpumask at the beginning of a new window, in a round-robin +fashion. This behavior can be changed by changing the thread mode, +the available options are: + + - none:do not force migration + - round-robin: migrate across each CPU specified in cpus between each window + +By default, hwlat detector will also obey the tracing_cpumask, so the thread +will be placed only in the set of cpus that is both on the hwlat detector's +cpus and in the global tracing_cpumask file. The user can overwrite the +cpumask by setting it manually. Changing the hwlatd affinity externally, +e.g., via taskset tool, will disable the round-robin migration. -The hwlat detector's kernel thread will migrate across each CPU specified in -cpus list between each window. The hwlat detector will also obey the -tracing_cpumask, so the thread will migrate on the set of cpus that is -both on its cpus list and in the global tracing_cpumask file. -To limit the migration, either modify cpumask, or modify the hwlat kernel -thread (named [hwlatd]) CPU affinity directly, and the migration will stop. diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index deecb93f97f2..3818200c9e24 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -60,6 +60,15 @@ static struct task_struct *hwlat_kthread; static struct dentry *hwlat_sample_width; /* sample width us */ static struct dentry *hwlat_sample_window; /* sample window us */ static struct dentry *hwlat_cpumask_dentry;/* hwlat cpus allowed */ +static struct dentry *hwlat_thread_mode; /* hwlat thread mode */ + +enum { + MODE_NONE = 0, + MODE_ROUND_ROBIN, + MODE_MAX +}; + +static char *thread_mode_str[] = { "none", "round-robin" }; /* Save the previous tracing_thresh value */ static unsigned long save_tracing_thresh; @@ -97,11 +106,16 @@ static struct hwlat_data { u64 sample_window; /* total sampling window (on+off) */ u64 sample_width; /* active sampling portion of window */ + int thread_mode;/* thread mode */ + } hwlat_data = { .sample_window = DEFAULT_SAMPLE_WINDOW, .sample_width = DEFAULT_SAMPLE_WIDTH, + .thread_mode= MODE_ROUND_ROBIN }; +static bool hwlat_busy; + static void trace_hwlat_sample(struct hwlat_sample *sample) { struct trace_array *tr = hwlat_trace; @@ -337,7 +351,8 @@ static int kthread_fn(void *data) while (!kthread_should_stop()) { - move_to_next_cpu(); + if (hwlat_data.thread_mode == MODE_ROUND_ROBIN) + move_to_next_cpu(); local_irq_disable(); get_sample(); @@ -375,6 +390,14 @@ static int start_kthread(struct trace_array *tr) if (hwlat_kthread) return 0; + + kthread = kthread_create(kthread_fn, NULL, "hwlatd"); + if (IS_ERR(kthread)) { + pr_err(BANNER "could not start sampling thread\n"); + return -ENOMEM; + } + + /* Just pick the first CPU on first iteration */ get_online_cpus(); /* @@ -386,16 +409,14 @@ static int start_kthread(struct trace_array *tr) */ cpumask_and(current_mask, cpu_online_mask, current_mask); put_online_cpus(); -
[RFC PATCH 1/5] tracing/hwlat: Add a cpus file specific for hwlat_detector
Provides a "cpus" interface to the hardware latency detector. By default, it lists all CPUs, allowing hwlatd threads to run on any online CPU of the system. It serves to restrict the execution of hwlatd to the set of CPUs writing via this interface. Note that hwlatd also respects the "tracing_cpumask." Hence, hwlatd threads will run only on the set of CPUs allowed here AND on "tracing_cpumask." Why not keep just "tracing_cpumask"? Because the user might be interested in tracing what is running on other CPUs. For instance, one might run hwlatd in one HT CPU while observing what is running on the sibling HT CPU. The cpu list format is also more intuitive. Also in preparation to the per-cpu mode. Cc: Jonathan Corbet Cc: Steven Rostedt Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Alexandre Chartre Cc: Clark Willaims Cc: John Kacur Cc: Juri Lelli Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira --- Documentation/trace/hwlat_detector.rst | 14 +-- kernel/trace/trace_hwlat.c | 125 - 2 files changed, 131 insertions(+), 8 deletions(-) diff --git a/Documentation/trace/hwlat_detector.rst b/Documentation/trace/hwlat_detector.rst index 5739349649c8..86f973a7763c 100644 --- a/Documentation/trace/hwlat_detector.rst +++ b/Documentation/trace/hwlat_detector.rst @@ -73,11 +73,13 @@ in /sys/kernel/tracing: - tracing_threshold - minimum latency value to be considered (usecs) - tracing_max_latency - maximum hardware latency actually observed (usecs) - - tracing_cpumask - the CPUs to move the hwlat thread across - - hwlat_detector/width- specified amount of time to spin within window (usecs) - - hwlat_detector/window - amount of time between (width) runs (usecs) + - hwlat_detector/width - specified amount of time to spin within window (usecs) + - hwlat_detector/window- amount of time between (width) runs (usecs) + - hwlat_detector/cpus - the CPUs to move the hwlat thread across The hwlat detector's kernel thread will migrate across each CPU specified in -tracing_cpumask between each window. To limit the migration, either modify -tracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU -affinity directly, and the migration will stop. +cpus list between each window. The hwlat detector will also obey the +tracing_cpumask, so the thread will migrate on the set of cpus that is +both on its cpus list and in the global tracing_cpumask file. +To limit the migration, either modify cpumask, or modify the hwlat kernel +thread (named [hwlatd]) CPU affinity directly, and the migration will stop. diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 34dc1a712dcb..deecb93f97f2 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -59,6 +59,7 @@ static struct task_struct *hwlat_kthread; static struct dentry *hwlat_sample_width; /* sample width us */ static struct dentry *hwlat_sample_window; /* sample window us */ +static struct dentry *hwlat_cpumask_dentry;/* hwlat cpus allowed */ /* Save the previous tracing_thresh value */ static unsigned long save_tracing_thresh; @@ -272,6 +273,7 @@ static int get_sample(void) return ret; } +static struct cpumask hwlat_cpumask; static struct cpumask save_cpumask; static bool disable_migrate; @@ -292,7 +294,14 @@ static void move_to_next_cpu(void) goto disable; get_online_cpus(); - cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask); + /* +* Run only on CPUs in which trace and hwlat are allowed to run. +*/ + cpumask_and(current_mask, tr->tracing_cpumask, _cpumask); + /* +* And the CPU is online. +*/ + cpumask_and(current_mask, cpu_online_mask, current_mask); next_cpu = cpumask_next(smp_processor_id(), current_mask); put_online_cpus(); @@ -368,7 +377,14 @@ static int start_kthread(struct trace_array *tr) /* Just pick the first CPU on first iteration */ get_online_cpus(); - cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask); + /* +* Run only on CPUs in which trace and hwlat are allowed to run. +*/ + cpumask_and(current_mask, tr->tracing_cpumask, _cpumask); + /* +* And the CPU is online. +*/ + cpumask_and(current_mask, cpu_online_mask, current_mask); put_online_cpus(); next_cpu = cpumask_first(current_mask); @@ -402,6 +418,94 @@ static void stop_kthread(void) hwlat_kthread = NULL; } +/* + * hwlat_cpus_read - Read function for reading the "cpus" file + * @filp: The active open file structure + * @ubuf: The userspace provided buffer to read value into + * @cnt: The maximum number of bytes to read + * @ppos: The current "file" position + * + * Prin
[RFC PATCH 0/5] hwlat improvements and osnoise tracer
This series proposes a set of improvements and new features for the tracing subsystem to facilitate the debugging of low latency deployments. Currently, hwlat runs on a single CPU at a time, migrating across a set of CPUs in a round-robin fashion. The first three patches are changes made to allow hwlat to run on multiple CPUs in parallel, increasing the chances of detecting a hardware latency. The fourth patch is a helper to print a timestamp in a u64 in seconds.nanoseconds format on tracepoints. The fifth patch proposes a new tracer named osnoise and aims to help users of isolcpus= (or a similar method) to measure how much noise the OS and the hardware add to the isolated application. The osnoise tracer bases on the hwlat detector code. The difference is that, instead of sampling with interrupts disabled, the osnoise tracer samples the CPU with interrupts and preemption enabled. In this way, the sampling thread will suffer any source of noise from the OS. The detection and classification of the type of noise are then made by observing the entry points of NMIs, IRQs, SoftIRQs, and threads. If none of these sources of noise is detected, the tool associates the noise with the hardware. The tool periodically prints a status, printing the total noise of the period, the max single noise observed, the percentage of CPU available for the task, along with the counters of each source of the noise. To debug the sources of noise, the tracer also adds a set of tracepoints that print any NMI, IRQ, SofIRQ, and thread occurrence. These tracepoints print the starting time and the noise's net duration at the end of the noise. In this way, it reduces the number of tracepoints (one instead of two) and the need to manually accounting the contribution of each noise independently. Daniel Bristot de Oliveira (4): tracing/hwlat: Add a cpus file specific for hwlat_detector tracing/hwlat: Implement the mode config option tracing/hwlat: Implement the per-cpu mode tracing: Add the osnoise tracer Steven Rostedt (1): tracing: Add __print_ns_to_secs() and __print_ns_without_secs() helpers Documentation/trace/hwlat_detector.rst | 29 +- Documentation/trace/osnoise_tracer.rst | 149 ++ include/linux/ftrace_irq.h | 16 + include/trace/events/osnoise.h | 141 ++ include/trace/trace_events.h | 25 + kernel/trace/Kconfig | 34 + kernel/trace/Makefile |1 + kernel/trace/trace.h |9 +- kernel/trace/trace_entries.h | 27 + kernel/trace/trace_hwlat.c | 445 +- kernel/trace/trace_osnoise.c | 1714 kernel/trace/trace_output.c| 72 +- 12 files changed, 2604 insertions(+), 58 deletions(-) create mode 100644 Documentation/trace/osnoise_tracer.rst create mode 100644 include/trace/events/osnoise.h create mode 100644 kernel/trace/trace_osnoise.c -- 2.30.2
Re: [PATCH] sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
On 1/19/21 9:35 AM, Dietmar Eggemann wrote: > dl_add_task_root_domain() is called during sched domain rebuild: > > rebuild_sched_domains_locked() > partition_and_rebuild_sched_domains() > rebuild_root_domains() > for all top_cpuset descendants: >update_tasks_root_domain() > for all tasks of cpuset: >dl_add_task_root_domain() > > Change it so that only the task pi lock is taken to check if the task > has a SCHED_DEADLINE (DL) policy. In case that p is a DL task take the > rq lock as well to be able to safely de-reference root domain's DL > bandwidth structure. > > Most of the tasks will have another policy (namely SCHED_NORMAL) and > can now bail without taking the rq lock. > > One thing to note here: Even in case that there aren't any DL user > tasks, a slow frequency switching system with cpufreq gov schedutil has > a DL task (sugov) per frequency domain running which participates in DL > bandwidth management. > > Reviewed-by: Quentin Perret > Signed-off-by: Dietmar Eggemann Reviewed-by: Daniel Bristot de Oliveira Thanks! -- Daniel
BUG: sleeping function called from invalid context at kernel/stop_machine.c:135
Hi Waiman, Are you aware of this issue: - %< - [ 88.307857] BUG: sleeping function called from invalid context at kernel/stop_machine.c:135 [ 88.308796] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 801, name: sh [ 88.309785] 6 locks held by sh/801: [ 88.310265] #0: 9f008c575460 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0x58/0xd0 [ 88.310906] #1: 9f008e9dd088 (>mutex){+.+.}-{4:4}, at: kernfs_fop_write+0xa5/0x1c0 [ 88.311672] #2: 9f0092164a88 (kn->active#195){.+.+}-{0:0}, at: kernfs_fop_write+0xad/0x1c0 [ 88.312456] #3: bac68310 (cpu_hotplug_lock){}-{0:0}, at: sched_partition_write+0x72/0x2f0 [ 88.313280] #4: bae37090 (_rwsem){}-{0:0}, at: sched_partition_write+0x7e/0x2f0 [ 88.314095] #5: bad89140 (rcu_read_lock){}-{1:3}, at: update_sibling_cpumasks+0x5/0x140 [ 88.314806] Preemption disabled at: [ 88.314810] [] preempt_schedule_thunk+0x16/0x18 [ 88.315815] CPU: 1 PID: 801 Comm: sh Not tainted 5.10.0-rc5+ #10 [ 88.316203] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 88.316714] Call Trace: [ 88.316875] dump_stack+0x8b/0xb0 [ 88.317087] ___might_sleep.cold+0x102/0x116 [ 88.317354] stop_one_cpu+0x82/0xa0 [ 88.317578] ? set_cpus_allowed_ptr+0x10/0x10 [ 88.317858] __set_cpus_allowed_ptr+0x1e6/0x1f0 [ 88.318144] update_tasks_cpumask+0x25/0x50 [ 88.318415] update_cpumasks_hier+0x257/0x840 [ 88.318687] update_sibling_cpumasks+0x96/0x140 [ 88.318968] update_prstate+0x1a0/0x1f0 [ 88.319210] sched_partition_write+0x9f/0x2f0 [ 88.319482] kernfs_fop_write+0xdc/0x1c0 [ 88.319730] vfs_write+0xea/0x3b0 [ 88.319943] ksys_write+0x58/0xd0 [ 88.320156] do_syscall_64+0x33/0x40 [ 88.320382] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 88.320692] RIP: 0033:0x7fbbd79be537 [ 88.320915] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 88.322028] RSP: 002b:7ffd44cc8398 EFLAGS: 0246 ORIG_RAX: 0001 [ 88.322479] RAX: ffda RBX: 0005 RCX: 7fbbd79be537 [ 88.322910] RDX: 0005 RSI: 558ae69200a0 RDI: 0001 [ 88.323342] RBP: 558ae69200a0 R08: 000a R09: 0004 [ 88.323775] R10: 558ae6921ba0 R11: 0246 R12: 0005 [ 88.325046] R13: 7fbbd7a90500 R14: 0005 R15: 7fbbd7a90700 - >% - it happens when I run this (from Dietmar): - %< - cd /sys/fs/cgroup/ echo $$ > cgroup.procs echo +cpuset > cgroup.subtree_control chrt -d --sched-period 10 --sched-runtime 1 0 sleep 500 & PID1=$! chrt -d --sched-period 10 --sched-runtime 1 0 sleep 500 & PID2=$! sleep 3 mkdir A echo 0 > ./A/cpuset.mems echo 0 > ./A/cpuset.cpus echo $PID2 > ./A/cgroup.procs echo root > ./A/cpuset.cpus.partition echo $PID2 > ./A/cgroup.procs cat /proc/$PID1/status | grep Cpus_allowed_list | awk '{print $2}' cat /proc/$PID2/status | grep Cpus_allowed_list | awk '{print $2}' - >% - with this kernel config: https://bristot.me/lkml/20200119/config.txt on Fedora 32 (x86 64 vm). full dmesg here: https://bristot.me/lkml/20200119/dmesg.txt -- Daniel
Re: [PATCH 4/6] sched/deadline: Block DL tasks on non-exclusive cpuset if bandwitdh control is enable
On 1/14/21 4:51 PM, Dietmar Eggemann wrote: > On 12/01/2021 16:53, Daniel Bristot de Oliveira wrote: >> The current SCHED_DEADLINE design supports only global scheduler, >> or variants of it, i.e., clustered and partitioned, via cpuset config. >> To enable the partitioning of a system with clusters of CPUs, the >> documentation advises the usage of exclusive cpusets, creating an >> exclusive root_domain for the cpuset. >> >> Attempts to change the cpu affinity of a thread to a cpu mask different >> from the root domain results in an error. For instance: > > [...] > >> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c >> index 788a391657a5..c221e14d5b86 100644 >> --- a/kernel/sched/deadline.c >> +++ b/kernel/sched/deadline.c >> @@ -2878,6 +2878,13 @@ int dl_task_can_attach(struct task_struct *p, >> if (cpumask_empty(cs_cpus_allowed)) >> return 0; >> >> +/* >> + * Do not allow moving tasks to non-exclusive cpusets >> + * if bandwidth control is enabled. >> + */ >> +if (dl_bandwidth_enabled() && !exclusive) >> +return -EBUSY; >> + >> /* >> * The task is not moving to another root domain, so it is >> * already accounted. >> > > But doesn't this mean you only have to set this cgroup exclusive/root to > run into the same issue: > > with this patch: > > cgroupv1: > > root@juno:/sys/fs/cgroup/cpuset# chrt -d --sched-period 10 > --sched-runtime 1 0 sleep 500 & > [1] 1668 > root@juno:/sys/fs/cgroup/cpuset# PID1=$! > > root@juno:/sys/fs/cgroup/cpuset# chrt -d --sched-period 10 > --sched-runtime 1 0 sleep 500 & > [2] 1669 > root@juno:/sys/fs/cgroup/cpuset# PID2=$! > > root@juno:/sys/fs/cgroup/cpuset# mkdir A > > root@juno:/sys/fs/cgroup/cpuset# echo 0 > ./A/cpuset.mems > root@juno:/sys/fs/cgroup/cpuset# echo 0 > ./A/cpuset.cpus > > root@juno:/sys/fs/cgroup/cpuset# echo $PID2 > ./A/cgroup.procs > -bash: echo: write error: Device or resource busy > > root@juno:/sys/fs/cgroup/cpuset# echo 1 > ./A/cpuset.cpu_exclusive > > root@juno:/sys/fs/cgroup/cpuset# echo $PID2 > ./A/cgroup.procs > > root@juno:/sys/fs/cgroup/cpuset# cat /proc/$PID1/status | grep > Cpus_allowed_list | awk '{print $2}' > 0-5 > root@juno:/sys/fs/cgroup/cpuset# cat /proc/$PID2/status | grep > Cpus_allowed_list | awk '{print $2}' > 0 On CPU v1 we also need to disable the load balance to create a root domain, right? > cgroupv2: Yeah, I see your point. I was seeing a different output because of Fedora default's behavior of adding the tasks to the system.slice/user.slice... doing: > root@juno:/sys/fs/cgroup# echo +cpuset > cgroup.subtree_control # echo $$ > cgroup.procs > root@juno:/sys/fs/cgroup# chrt -d --sched-period 10 > --sched-runtime 1 0 sleep 500 & > [1] 1687 > root@juno:/sys/fs/cgroup# PID1=$! > > root@juno:/sys/fs/cgroup# chrt -d --sched-period 10 > --sched-runtime 1 0 sleep 500 & > [2] 1688 > root@juno:/sys/fs/cgroup# PID2=$! > > root@juno:/sys/fs/cgroup# mkdir A > > root@juno:/sys/fs/cgroup# echo 0 > ./A/cpuset.mems > root@juno:/sys/fs/cgroup# echo 0 > ./A/cpuset.cpus > > root@juno:/sys/fs/cgroup# echo $PID2 > ./A/cgroup.procs > -bash: echo: write error: Device or resource busy > > root@juno:/sys/fs/cgroup# echo root > ./A/cpuset.cpus.partition > > root@juno:/sys/fs/cgroup# echo $PID2 > ./A/cgroup.procs > > root@juno:/sys/fs/cgroup# cat /proc/$PID1/status | grep > Cpus_allowed_list | awk '{print $2}' > 0-5 > root@juno:/sys/fs/cgroup# cat /proc/$PID2/status | grep > Cpus_allowed_list | awk '{print $2}' > 0 makes me see the same behavior. This will require further analysis. So, my plan now is to split this patch set into two, one with patches 1,3,5, and 6, which fixes the most "easy" part of the problems, and another with 2 and 4 that will require further investigation (discussed this with Dietmar already). Thoughts? -- Daniel
Re: [PATCH 6/6] sched/deadline: Fixes cpu/rd/dl_bw references for suspended tasks
On 1/15/21 3:36 PM, Dietmar Eggemann wrote: > On 12/01/2021 16:53, Daniel Bristot de Oliveira wrote: > > [...] > >> - %< - >> #!/bin/bash >> # Enter on the cgroup directory >> cd /sys/fs/cgroup/ >> >> # Check it if is cgroup v2 and enable cpuset >> if [ -e cgroup.subtree_control ]; then >> # Enable cpuset controller on cgroup v2 >> echo +cpuset > cgroup.subtree_control >> fi >> >> echo LOG: create an exclusive cpuset and assigned the CPU 0 to it >> # Create cpuset groups >> rmdir dl-group &> /dev/null >> mkdir dl-group >> >> # Restrict the task to the CPU 0 >> echo 0 > dl-group/cpuset.mems >> echo 0 > dl-group/cpuset.cpus >> echo root > dl-group/cpuset.cpus.partition >> >> echo LOG: dispatching a regular task >> sleep 100 & >> CPUSET_PID="$!" >> >> # let it settle down >> sleep 1 >> >> # Assign the second task to the cgroup > > There is only one task 'CPUSET_PID' involved here? Ooops, yep, I will remove the "second" part. >> echo LOG: moving the second DL task to the cpuset >> echo "$CPUSET_PID" > dl-group/cgroup.procs 2> /dev/null >> >> CPUSET_ALLOWED=`cat /proc/$CPUSET_PID/status | grep Cpus_allowed_list | >> awk '{print $2}'` >> >> chrt -p -d --sched-period 10 --sched-runtime 1 0 >> $CPUSET_PID >> ACCEPTED=$? >> >> if [ $ACCEPTED == 0 ]; then >> echo PASS: the task became DL >> else >> echo FAIL: the task was rejected as DL >> fi > > [...] > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 5961a97541c2..3c2775e6869f 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -5905,15 +5905,15 @@ static int __sched_setscheduler(struct task_struct >> *p, >> #ifdef CONFIG_SMP >> if (dl_bandwidth_enabled() && dl_policy(policy) && >> !(attr->sched_flags & SCHED_FLAG_SUGOV)) { >> -cpumask_t *span = rq->rd->span; >> +struct root_domain *rd = dl_task_rd(p); >> >> /* >> * Don't allow tasks with an affinity mask smaller than >> * the entire root_domain to become SCHED_DEADLINE. We >> * will also fail if there's no bandwidth available. >> */ >> -if (!cpumask_subset(span, p->cpus_ptr) || >> -rq->rd->dl_bw.bw == 0) { >> +if (!cpumask_subset(rd->span, p->cpus_ptr) || >> +rd->dl_bw.bw == 0) { >> retval = -EPERM; >> goto unlock; >> } >> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c >> index c221e14d5b86..1f6264cb8867 100644 >> --- a/kernel/sched/deadline.c >> +++ b/kernel/sched/deadline.c >> @@ -2678,8 +2678,8 @@ int sched_dl_overflow(struct task_struct *p, int >> policy, >> u64 period = attr->sched_period ?: attr->sched_deadline; >> u64 runtime = attr->sched_runtime; >> u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0; >> -int cpus, err = -1, cpu = task_cpu(p); >> -struct dl_bw *dl_b = dl_bw_of(cpu); >> +int cpus, err = -1, cpu = dl_task_cpu(p); >> +struct dl_bw *dl_b = dl_task_root_bw(p); >> unsigned long cap; >> >> if (attr->sched_flags & SCHED_FLAG_SUGOV) >> > > Wouldn't it be sufficient to just introduce dl_task_cpu() and use the > correct cpu to get rd->span or struct dl_bw in __sched_setscheduler() > -> sched_dl_overflow()? While I think that having the dl_task_rd() makes the code more intuitive, I have no problem on not adding it... > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 5961a97541c2..0573f676696a 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5905,7 +5905,8 @@ static int __sched_setscheduler(struct task_struct *p, > #ifdef CONFIG_SMP > if (dl_bandwidth_enabled() && dl_policy(policy) && > !(attr->sched_flags & SCHED_FLAG_SUGOV)) { > - cpumask_t *span = rq->rd->span; > + int cpu = dl_task_cpu(p); > + cpumask_t *span = cpu_rq(cpu)->rd->span; > &g
Re: [PATCH 3/6] sched/deadline: Allow DL tasks on empty (cgroup v2) cpusets
On 1/14/21 1:12 PM, Dietmar Eggemann wrote: > On 12/01/2021 16:53, Daniel Bristot de Oliveira wrote: >> cgroups v2 allows the cpuset controller to be enabled/disabled on >> demand. On Fedora 32, cpuset is disabled by default. To enable it, >> a user needs to: >> >> # cd /sys/fs/cgroup/ >> # echo +cpuset > cgroup.subtree_control >> >> Existing cgroups will expose the cpuset interface (e.g., cpuset.cpus >> file). By default, cpuset.cpus has no CPU assigned, which means that >> existing tasks will move to a cpuset without cpus. >> >> With that in mind, look what happens if a SCHED_DEADLINE task exists >> on any cgroup (user.slice by default on Fedora): >> >> - %< - >> # chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 & > > Like you mentioned above, to see the issue the DL task has to be moved > into the cgroup (e.g. user.slice) here: > > echo $PID > /sys/fs/cgroup/user.slice/cgroup.procs Ops, I "forgot" to add this step because Fedora/systemd does it by default. I will add this to the log. >> # cd /sys/fs/cgroup/ >> # echo '+cpuset' > cgroup.subtree_control >> [ 65.384041] BUG: unable to handle page fault for address: >> b720f7e0 >> [ 65.384551] #PF: supervisor read access in kernel mode >> [ 65.384923] #PF: error_code(0x) - not-present page >> [ 65.385298] PGD 61a15067 P4D 61a15067 PUD 61a16063 PMD 800f9ddff062 >> [ 65.385781] Oops: [#1] SMP PTI >> [ 65.386042] CPU: 0 PID: 799 Comm: sh Not tainted 5.10.0-rc3 #1 >> [ 65.386461] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >> 1.13.0-2.fc32 04/01/2014 >> [ 65.387077] RIP: 0010:dl_task_can_attach+0x40/0x250 >> [ 65.387429] Code: 54 55 53 48 83 ec 18 48 89 3c 24 bf ff ff ff ff e8 05 >> a2 52 00 >>4c 63 f0 48 c7 c5 00 9e 02 00 4a 8b 04 f5 00 09 47 b6 >> 48 89 ea >><4c> 8b a4 10 e0 09 00 00 49 8d 44 24 40 48 89 c7 48 >> 89 44 24 >>08 e8 >> [ 65.388768] RSP: 0018:aee8c056fcd8 EFLAGS: 00010283 >> [ 65.389148] RAX: b71e5000 RBX: aee8c056fdd0 RCX: >> 0040 >> [ 65.389661] RDX: 00029e00 RSI: 9db202534e48 RDI: >> b6d3a3e0 >> [ 65.390174] RBP: 00029e00 R08: R09: >> 0004 >> [ 65.390686] R10: 0001 R11: ffa6fbff R12: >> aee8c056fbf0 >> [ 65.391196] R13: 9db2024e1400 R14: 0004 R15: >> 9db20ebb31e0 >> [ 65.391710] FS: 7f6df41b1740() GS:9db377c0() >> knlGS: >> [ 65.392289] CS: 0010 DS: ES: CR0: 80050033 >> [ 65.392705] CR2: b720f7e0 CR3: 00010680a003 CR4: >> 00370ef0 >> [ 65.393220] DR0: DR1: DR2: >> >> [ 65.393732] DR3: DR6: fffe0ff0 DR7: >> 0400 >> [ 65.394244] Call Trace: >> [ 65.394437] cpuset_can_attach+0x8b/0x110 >> [ 65.394732] cgroup_migrate_execute+0x70/0x430 >> [ 65.395057] cgroup_update_dfl_csses+0x222/0x230 >> [ 65.395392] cgroup_subtree_control_write+0x2c6/0x3c0 >> [ 65.395759] kernfs_fop_write+0xce/0x1b0 >> [ 65.396048] vfs_write+0xc2/0x230 >> [ 65.396291] ksys_write+0x4f/0xc0 >> [ 65.396533] do_syscall_64+0x33/0x40 >> [ 65.396797] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 65.397166] RIP: 0033:0x7f6df42a6537 >> [ 65.397428] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f >> 00 f3 0f >>1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 >> 00 0f 05 >><48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 >> 18 48 89 >>74 24 >> [ 65.398766] RSP: 002b:7ffee4128018 EFLAGS: 0246 ORIG_RAX: >> 0001 >> [ 65.399838] RAX: ffda RBX: 0008 RCX: >> 7f6df42a6537 >> [ 65.400923] RDX: 0008 RSI: 55b3f7e549e0 RDI: >> 0001 >> [ 65.402003] RBP: 55b3f7e549e0 R08: 000a R09: >> 0007 >> [ 65.403082] R10: 0004 R11: 0246 R12: >> 0008 >> [ 65.404156] R13: 7f6df4378500 R14: 0008 R15: >> 7f6df4378700 >> [ 65.405218] Modules linked in: >> [ 65.414172] C
[PATCH 6/6] sched/deadline: Fixes cpu/rd/dl_bw references for suspended tasks
__set_cpus_allowed_ptr() migrates running or runnable, setting the task's cpu accordingly. The CPU is not set when the task is not runnable because of complications on the hotplug code. The task cpu will be updated in the next wakeup anyway. However, this creates a problem for the usage of task_cpu(p), which might point the task to a CPU in which it cannot run, or worse, a runqueue/root_domain it does not belong to, causing some odd errors. For example, the script below shows that a sleeping task cannot become SCHED_DEADLINE if they moved to another (exclusive) cpuset: - %< - #!/bin/bash # Enter on the cgroup directory cd /sys/fs/cgroup/ # Check it if is cgroup v2 and enable cpuset if [ -e cgroup.subtree_control ]; then # Enable cpuset controller on cgroup v2 echo +cpuset > cgroup.subtree_control fi echo LOG: create an exclusive cpuset and assigned the CPU 0 to it # Create cpuset groups rmdir dl-group &> /dev/null mkdir dl-group # Restrict the task to the CPU 0 echo 0 > dl-group/cpuset.mems echo 0 > dl-group/cpuset.cpus echo root > dl-group/cpuset.cpus.partition echo LOG: dispatching a regular task sleep 100 & CPUSET_PID="$!" # let it settle down sleep 1 # Assign the second task to the cgroup echo LOG: moving the second DL task to the cpuset echo "$CPUSET_PID" > dl-group/cgroup.procs 2> /dev/null CPUSET_ALLOWED=`cat /proc/$CPUSET_PID/status | grep Cpus_allowed_list | awk '{print $2}'` chrt -p -d --sched-period 10 --sched-runtime 1 0 $CPUSET_PID ACCEPTED=$? if [ $ACCEPTED == 0 ]; then echo PASS: the task became DL else echo FAIL: the task was rejected as DL fi # Just ignore the clean up exec > /dev/null 2>&1 kill -9 $CPUSET_PID kill -9 $ROOT_PID rmdir dl-group - >% - Long story short: the sleep task is (not runnable) on a cpu != 0. After moving to a cpuset with only the CPU 0, task_cpu() returns a cpu that does not belong to the cpuset the task is in, and the task is rejected in this if: - %< - __sched_setscheduler(): [...] rq = task_rq_lock(p, ); <-- uses task_cpu(), that points to <-- the old cpu. [...] if (dl_bandwidth_enabled() && dl_policy(policy) && !(attr->sched_flags & SCHED_FLAG_SUGOV)) { cpumask_t *span = rq->rd->span;<--- wrong rd! /* * Don't allow tasks with an affinity mask smaller than * the entire root_domain to become SCHED_DEADLINE. We * will also fail if there's no bandwidth available. */ if (!cpumask_subset(span, p->cpus_ptr) || rq->rd->dl_bw.bw == 0) { retval = -EPERM; <--- returns here. goto unlock; } } - >% - Because the rq, and so the root domain, corresponding to the ones of the CPU in which the sleep command went to... sleep, not the ones it will run in the next wakeup because of its affinity. To avoid this problem, use the dl_task* helpers that return the task cpu, root domain, and the "root" dl_bw, aware of the status of task->cpu. Reported-by: Marco Perronet Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Li Zefan Cc: Tejun Heo Cc: Johannes Weiner Cc: Valentin Schneider Cc: linux-kernel@vger.kernel.org Cc: cgro...@vger.kernel.org --- kernel/sched/core.c | 6 +++--- kernel/sched/deadline.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5961a97541c2..3c2775e6869f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5905,15 +5905,15 @@ static int __sched_setscheduler(struct task_struct *p, #ifdef CONFIG_SMP if (dl_bandwidth_enabled() && dl_policy(policy) && !(attr->sched_flags & SCHED_FLAG_SUGOV)) { - cpumask_t *span = rq->rd->span; + struct root_domain *rd = dl_task_rd(p); /* * Don't allow tasks with an affinity mask smaller than * the entire root_domain to become SCHED_DEADLINE. We * will also fail if there's no bandwidth available. */ - if (!cpumask_subset(span, p->cpus_ptr) || - rq->rd->dl_bw.bw ==
[PATCH 2/6] sched/deadline: Inform dl_task_can_attach() if the cpuset is exclusive
In preparation for blocking SCHED_DEADLINE tasks on non-exclusive cpuset if bandwidth control is enabled. No functional changes. Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Li Zefan Cc: Tejun Heo cc: Johannes Weiner Cc: Valentin Schneider Cc: linux-kernel@vger.kernel.org Cc: cgro...@vger.kernel.org --- include/linux/sched.h | 2 +- kernel/cgroup/cpuset.c | 5 - kernel/sched/core.c | 4 ++-- kernel/sched/deadline.c | 3 ++- kernel/sched/sched.h| 2 +- 5 files changed, 10 insertions(+), 6 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 942b87f80cc7..9525fbe032c9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1654,7 +1654,7 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags) } extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed); +extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed, int exclusive); #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 53c70c470a38..c6513dfabd3d 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2132,6 +2132,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) struct cgroup_subsys_state *css; struct cpuset *cs; struct task_struct *task; + int exclusive; int ret; /* used later by cpuset_attach() */ @@ -2146,8 +2147,10 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))) goto out_unlock; + exclusive = is_cpu_exclusive(cs); + cgroup_taskset_for_each(task, css, tset) { - ret = task_can_attach(task, cs->cpus_allowed); + ret = task_can_attach(task, cs->cpus_allowed, exclusive); if (ret) goto out_unlock; ret = security_task_setscheduler(task); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f4aede34449c..5961a97541c2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7124,7 +7124,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur, } int task_can_attach(struct task_struct *p, - const struct cpumask *cs_cpus_allowed) + const struct cpumask *cs_cpus_allowed, int exclusive) { int ret = 0; @@ -7143,7 +7143,7 @@ int task_can_attach(struct task_struct *p, } if (dl_task(p)) - ret = dl_task_can_attach(p, cs_cpus_allowed); + ret = dl_task_can_attach(p, cs_cpus_allowed, exclusive); out: return ret; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index c97d2c707b98..943aa32cc1bc 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2862,7 +2862,8 @@ bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr) } #ifdef CONFIG_SMP -int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed) +int dl_task_can_attach(struct task_struct *p, + const struct cpumask *cs_cpus_allowed, int exclusive) { unsigned long flags, cap; unsigned int dest_cpu; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f5acb6c5ce49..54881d99cebd 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -337,7 +337,7 @@ extern void __setparam_dl(struct task_struct *p, const struct sched_attr *attr); extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr); extern bool __checkparam_dl(const struct sched_attr *attr); extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr); -extern int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed); +extern int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed, int exclusive); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern bool dl_cpu_busy(unsigned int cpu); -- 2.29.2
[PATCH 5/6] sched/deadline: Add helpers to get the correct root domain/span/dl_bw
task_cpu() is often used on SCHED_DEADLINE to access the root_domain and associated information. However, the task_cpu() does not always reflect the correct CPU. The reason being is shown in the code: >From kernel/sched/core: - %< - * p->on_rq <- { 0, 1 = TASK_ON_RQ_QUEUED, 2 = TASK_ON_RQ_MIGRATING }: * * is set by activate_task() and cleared by deactivate_task(), under * rq->lock. Non-zero indicates the task is runnable, the special * ON_RQ_MIGRATING state is used for migration without holding both * rq->locks. It indicates task_cpu() is not stable, see task_rq_lock(). [...] * task_cpu(p): is changed by set_task_cpu(), the rules are: * * - Don't call set_task_cpu() on a blocked task: * *We don't care what CPU we're not running on, this simplifies hotplug, *the CPU assignment of blocked tasks isn't required to be valid. - >% - So, a sleeping task will not have its task_cpu() stable, and this is causing problems on SCHED_DEADLINE. In preparation for fixing this problems, we need to add helper functions that return the dl_task_cpu, root_domain, and "root" dl_bw that reflects the current placement/affinity of the task. Note that these functions are only required on the code path that can happen when the task is not queued. Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Li Zefan Cc: Tejun Heo Cc: Johannes Weiner Cc: Valentin Schneider Cc: linux-kernel@vger.kernel.org Cc: cgro...@vger.kernel.org --- kernel/sched/sched.h | 43 +++ 1 file changed, 43 insertions(+) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 54881d99cebd..5f3f3cb5a6ff 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2393,6 +2393,37 @@ void __dl_update(struct dl_bw *dl_b, s64 bw) rq->dl.extra_bw += bw; } } + +static inline +int dl_task_cpu(struct task_struct *p) +{ + /* +* We can only rely on task_cpu() of runnable tasks. +*/ + if (p->on_rq) + return task_cpu(p); + + /* +* The task_cpu() of non-runnable task migth be pointing a CPU +* it cannot run anymore (see set_task_cpu()). Hence, let's +* get a possible cpu from the current cpu_mask. +*/ + return cpumask_any(p->cpus_ptr); + +} + +static inline +struct root_domain *dl_task_rd(struct task_struct *p) +{ + int cpu = dl_task_cpu(p); + return cpu_rq(cpu)->rd; +} + +static inline +struct dl_bw *dl_task_root_bw(struct task_struct *p) +{ + return _task_rd(p)->dl_bw; +} #else static inline void __dl_update(struct dl_bw *dl_b, s64 bw) @@ -2401,6 +2432,18 @@ void __dl_update(struct dl_bw *dl_b, s64 bw) dl->extra_bw += bw; } + +static inline +int dl_task_cpu(struct task_struct *p) +{ + return 0; +} + +static inline +struct dl_bw *dl_task_root_bw(struct task_struct *p) +{ + return _rq(p)->dl.dl_bw; +} #endif -- 2.29.2
[PATCH 4/6] sched/deadline: Block DL tasks on non-exclusive cpuset if bandwitdh control is enable
The current SCHED_DEADLINE design supports only global scheduler, or variants of it, i.e., clustered and partitioned, via cpuset config. To enable the partitioning of a system with clusters of CPUs, the documentation advises the usage of exclusive cpusets, creating an exclusive root_domain for the cpuset. Attempts to change the cpu affinity of a thread to a cpu mask different from the root domain results in an error. For instance: - %< - [root@x1 linux]# chrt -d --sched-period 10 --sched-runtime 1 0 sleep 1 & [1] 69020 [root@x1 linux]# taskset -p -c 0 69020 pid 69020's current affinity list: 0-7 taskset: failed to set pid 69020's affinity: Device or resource busy - >% - However, such restriction can be bypassed by disabling the SCHED_DEADLINE admission test, under the assumption that the user is aware of the implications of such a decision. However, Marco Perronet noticed that it was possible to by-pass this mechanism because no restriction is currently imposed by the cpuset mechanism. For instance, this script: - %< - #!/bin/bash # Enter on the cgroup directory cd /sys/fs/cgroup/ # Check it if is cgroup v2 and enable cpuset if [ -e cgroup.subtree_control ]; then # Enable cpuset controller on cgroup v2 echo +cpuset > cgroup.subtree_control fi echo LOG: create a cpuset and assigned the CPU 0 to it # Create cpuset groups rmdir dl-group &> /dev/null mkdir dl-group # Restrict the task to the CPU 0 echo 0 > dl-group/cpuset.mems echo 0 > dl-group/cpuset.cpus # Place a task in the root cgroup echo LOG: dispatching the first DL task chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 & ROOT_PID="$!" ROOT_ALLOWED=`cat /proc/$ROOT_PID/status | grep Cpus_allowed_list | awk '{print $2}'` # Disapatch another task in the root cgroup, to move it later. echo LOG: dispatching the second DL task chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 & CPUSET_PID="$!" # let them settle down sleep 1 # Assign the second task to the cgroup echo LOG: moving the second DL task to the cpuset echo "$CPUSET_PID" > dl-group/cgroup.procs 2> /dev/null ACCEPTED=$? CPUSET_ALLOWED=`cat /proc/$CPUSET_PID/status | grep Cpus_allowed_list | awk '{print $2}'` if [ $ACCEPTED == 0 ]; then echo FAIL: a DL task was accepted on a non-exclusive cpuset else echo PASS: DL task was rejected on a non-exclusive cpuset fi if [ $ROOT_ALLOWED == $CPUSET_ALLOWED ]; then echo PASS: the affinity did not change: $CPUSET_ALLOWED == $ROOT_ALLOWED else echo FAIL: the cpu affinity is different: $CPUSET_ALLOWED == $ROOT_ALLOWED fi # Just ignore the clean up exec > /dev/null 2>&1 kill -9 $CPUSET_PID kill -9 $ROOT_PID rmdir dl-group - >% - Shows these results: - %< - LOG: create a cpuset and assigned the CPU 0 to it LOG: dispatching the first DL task LOG: dispatching the second DL task LOG: moving the second DL task to the cpuset FAIL: a DL task was accepted on a non-exclusive cpuset FAIL: the cpu affinity is different: 0 == 0-3 - >% - This result is a problem because the two tasks have a different cpu mask, but they end up sharing the cpu 0, which is something not supported in the current SCHED_DEADLINE designed (APA - Arbitrary Processor Affinities). To avoid such scenario, the correct action to be taken is rejecting the attach of SCHED_DEADLINE thread to a non-exclusive cpuset. With the proposed patch in place, the script above returns: - %< - LOG: create a cpuset and assigned the CPU 0 to it LOG: dispatching the first DL task LOG: dispatching the second DL task LOG: moving the second DL task to the cpuset PASS: DL task was rejected on a non-exclusive cpuset PASS: the affinity did not change: 0-3 == 0-3 - >% - Still, likewise for taskset, this restriction can be bypassed by disabling the admission test, i.e.: # sysctl -w kernel.sched_rt_runtime_us=-1 and work at their own risk. Reported-by: Marco Perronet Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Li Zefan Cc: Tejun Heo Cc: Johannes Weiner Cc: Valentin Schneider Cc: linux-kernel@vger.kernel.org Cc: cgro...@vger.kernel.org --- kernel/sched/deadline.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 788a391657a5..c221e14d5b86 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2878,6 +2878,13 @@ int dl_task_can_attach(struct task_struct *p, if (cpumask_empty(cs_cpus_allowed)) return 0; + /* +* Do not allow moving tasks to non-exclusive cpusets +* if bandwidth control is enabled. +*/ + if (dl_
[PATCH 3/6] sched/deadline: Allow DL tasks on empty (cgroup v2) cpusets
cgroups v2 allows the cpuset controller to be enabled/disabled on demand. On Fedora 32, cpuset is disabled by default. To enable it, a user needs to: # cd /sys/fs/cgroup/ # echo +cpuset > cgroup.subtree_control Existing cgroups will expose the cpuset interface (e.g., cpuset.cpus file). By default, cpuset.cpus has no CPU assigned, which means that existing tasks will move to a cpuset without cpus. With that in mind, look what happens if a SCHED_DEADLINE task exists on any cgroup (user.slice by default on Fedora): - %< - # chrt -d --sched-period 10 --sched-runtime 1 0 sleep 100 & # cd /sys/fs/cgroup/ # echo '+cpuset' > cgroup.subtree_control [ 65.384041] BUG: unable to handle page fault for address: b720f7e0 [ 65.384551] #PF: supervisor read access in kernel mode [ 65.384923] #PF: error_code(0x) - not-present page [ 65.385298] PGD 61a15067 P4D 61a15067 PUD 61a16063 PMD 800f9ddff062 [ 65.385781] Oops: [#1] SMP PTI [ 65.386042] CPU: 0 PID: 799 Comm: sh Not tainted 5.10.0-rc3 #1 [ 65.386461] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 65.387077] RIP: 0010:dl_task_can_attach+0x40/0x250 [ 65.387429] Code: 54 55 53 48 83 ec 18 48 89 3c 24 bf ff ff ff ff e8 05 a2 52 00 4c 63 f0 48 c7 c5 00 9e 02 00 4a 8b 04 f5 00 09 47 b6 48 89 ea <4c> 8b a4 10 e0 09 00 00 49 8d 44 24 40 48 89 c7 48 89 44 24 08 e8 [ 65.388768] RSP: 0018:aee8c056fcd8 EFLAGS: 00010283 [ 65.389148] RAX: b71e5000 RBX: aee8c056fdd0 RCX: 0040 [ 65.389661] RDX: 00029e00 RSI: 9db202534e48 RDI: b6d3a3e0 [ 65.390174] RBP: 00029e00 R08: R09: 0004 [ 65.390686] R10: 0001 R11: ffa6fbff R12: aee8c056fbf0 [ 65.391196] R13: 9db2024e1400 R14: 0004 R15: 9db20ebb31e0 [ 65.391710] FS: 7f6df41b1740() GS:9db377c0() knlGS: [ 65.392289] CS: 0010 DS: ES: CR0: 80050033 [ 65.392705] CR2: b720f7e0 CR3: 00010680a003 CR4: 00370ef0 [ 65.393220] DR0: DR1: DR2: [ 65.393732] DR3: DR6: fffe0ff0 DR7: 0400 [ 65.394244] Call Trace: [ 65.394437] cpuset_can_attach+0x8b/0x110 [ 65.394732] cgroup_migrate_execute+0x70/0x430 [ 65.395057] cgroup_update_dfl_csses+0x222/0x230 [ 65.395392] cgroup_subtree_control_write+0x2c6/0x3c0 [ 65.395759] kernfs_fop_write+0xce/0x1b0 [ 65.396048] vfs_write+0xc2/0x230 [ 65.396291] ksys_write+0x4f/0xc0 [ 65.396533] do_syscall_64+0x33/0x40 [ 65.396797] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 65.397166] RIP: 0033:0x7f6df42a6537 [ 65.397428] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 65.398766] RSP: 002b:7ffee4128018 EFLAGS: 0246 ORIG_RAX: 0001 [ 65.399838] RAX: ffda RBX: 0008 RCX: 7f6df42a6537 [ 65.400923] RDX: 0008 RSI: 55b3f7e549e0 RDI: 0001 [ 65.402003] RBP: 55b3f7e549e0 R08: 000a R09: 0007 [ 65.403082] R10: 0004 R11: 0246 R12: 0008 [ 65.404156] R13: 7f6df4378500 R14: 0008 R15: 7f6df4378700 [ 65.405218] Modules linked in: [ 65.414172] CR2: b720f7e0 [ 65.415117] ---[ end trace 2dbff1a688549e65 ]--- - >% - That happens because on dl_task_can_attach(): dest_cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed); returns a non active cpu. Initially, I thought about returning an error and blocking the operation. However, that is indeed not needed. The cpuset without CPUs assigned will be a non-root cpuset, hence its cpu mask will be the same as the root one. So, the bandwidth was already accounted, and the task can proceed. Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Li Zefan Cc: Tejun Heo Cc: Johannes Weiner Cc: Valentin Schneider Cc: linux-kernel@vger.kernel.org Cc: cgro...@vger.kernel.org --- kernel/sched/deadline.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 943aa32cc1bc..788a391657a5 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2871,6 +2871,13 @@ int dl_task_can_attach(struct task_stru
[PATCH 1/6] sched/deadline: Consolidate the SCHED_DL task_can_attach() check on its own function
It is easier to track the restrictions imposed on SCHED_DEADLINE tasks if we keep them all together in a single function. No function changes. Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Li Zefan Cc: Tejun Heo Cc: Johannes Weiner Cc: Valentin Schneider Cc: linux-kernel@vger.kernel.org Cc: cgro...@vger.kernel.org --- kernel/sched/core.c | 3 +-- kernel/sched/deadline.c | 7 +++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7af80c3fce12..f4aede34449c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7142,8 +7142,7 @@ int task_can_attach(struct task_struct *p, goto out; } - if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span, - cs_cpus_allowed)) + if (dl_task(p)) ret = dl_task_can_attach(p, cs_cpus_allowed); out: diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 75686c6d4436..c97d2c707b98 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2870,6 +2870,13 @@ int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allo bool overflow; int ret; + /* +* The task is not moving to another root domain, so it is +* already accounted. +*/ + if (cpumask_intersects(task_rq(p)->rd->span, cs_cpus_allowed)) + return 0; + dest_cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed); rcu_read_lock_sched(); -- 2.29.2
[PATCH 0/6] sched/deadline: cpuset task acceptance review
While surveying the properties of the SCHED_DEADLINE, Marco Perronet found some inconsistencies in the acceptance of DL threads on cpuset. More precisely, regarding the acceptance of treads with arbitrary affinity. He contacted me, and while doing the investigation, we found yet other potential issues addressed in this patch series. Each patch has a more in-depth explanation, including ways to reproduce the problem. Daniel Bristot de Oliveira (6): sched/deadline: Consolidate the SCHED_DL task_can_attach() check on its own function sched/deadline: Inform dl_task_can_attach() if the cpuset is exclusive sched/deadline: Allow DL tasks on empty (cgroup v2) cpusets sched/deadline: Block DL tasks on non-exclusive cpuset if bandwitdh control is enable sched/deadline: Add helpers to get the correct root domain/span/dl_bw sched/deadline: Fixes cpu/rd/dl_bw references for suspended tasks include/linux/sched.h | 2 +- kernel/cgroup/cpuset.c | 5 - kernel/sched/core.c | 13 ++-- kernel/sched/deadline.c | 28 ++--- kernel/sched/sched.h| 45 - 5 files changed, 80 insertions(+), 13 deletions(-) -- 2.29.2
Re: [PATCH] notifier: Make atomic_notifiers use raw_spinlock
On 11/22/20 9:19 PM, Valentin Schneider wrote: > Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno > leads to the idle task blocking on an RT sleeping spinlock down some > notifier path: > > [1.809101] BUG: scheduling while atomic: swapper/5/0/0x0002 > [1.809116] Modules linked in: > [1.809123] Preemption disabled at: > [1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227) > [1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GW > 5.10.0-rc3-rt7 #168 > [1.809153] Hardware name: ARM Juno development board (r0) (DT) > [1.809158] Call trace: > [1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 > (discriminator 1)) > [1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198) > [1.809178] dump_stack (lib/dump_stack.c:122) > [1.809188] __schedule_bug (kernel/sched/core.c:4886) > [1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 > kernel/sched/core.c:4913 kernel/sched/core.c:5040) > [1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 > (discriminator 1)) > [1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072) > [1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110) > [1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 > kernel/locking/rtmutex.c:1139) > [1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 > kernel/notifier.c:118 kernel/notifier.c:186) > [1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93) > [1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 > drivers/cpuidle/cpuidle-psci.c:129) > [1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238) > [1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353) > [1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 > kernel/sched/idle.c:273) > [1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1)) > [1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273) > > Two points worth noting: > > 1) That this is conceptually the same issue as pointed out in: >313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier") > 2) Only the _robust() variant of atomic_notifier callchains suffer from >this > > AFAICT only the cpu_pm_notifier_chain really needs to be changed, but > singling it out would mean introducing a new (truly) non-blocking API. At > the same time, callers that are fine with any blocking within the call > chain should use blocking notifiers, so patching up all atomic_notifier's > doesn't seem *too* crazy to me. > > Fixes: 70d932985757 ("notifier: Fix broken error handling pattern") > Signed-off-by: Valentin Schneider Reviewed-by: Daniel Bristot de Oliveira Thanks! -- Daniel
Re: [PATCH v2] sched/deadline: Fix priority inheritance with multiple scheduling classes
On 11/17/20 7:14 AM, Juri Lelli wrote: > Glenn reported that "an application [he developed produces] a BUG in > deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested > PTHREAD_PRIO_INHERIT mutexes. I believe the bug is triggered when a CFS > task that was boosted by a SCHED_DEADLINE task boosts another CFS task > (nested priority inheritance). > > [ cut here ] > kernel BUG at kernel/sched/deadline.c:1462! > invalid opcode: [#1] PREEMPT SMP > CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ... > Hardware name: ... > RIP: 0010:enqueue_task_dl+0x335/0x910 > Code: ... > RSP: 0018:c9000c2bbc68 EFLAGS: 00010002 > RAX: 0009 RBX: 888c0af94c00 RCX: 81e12500 > RDX: 002e RSI: 888c0af94c00 RDI: 888c10b22600 > RBP: c9000c2bbd08 R08: 0009 R09: 0078 > R10: 81e12440 R11: 81e1236c R12: 888bc8932600 > R13: 888c0af94eb8 R14: 888c10b22600 R15: 888bc8932600 > FS: 7fa58ac55700() GS:888c10b0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7fa58b523230 CR3: 000bf44ab003 CR4: 007606e0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > PKRU: 5554 > Call Trace: > ? intel_pstate_update_util_hwp+0x13/0x170 > rt_mutex_setprio+0x1cc/0x4b0 > task_blocks_on_rt_mutex+0x225/0x260 > rt_spin_lock_slowlock_locked+0xab/0x2d0 > rt_spin_lock_slowlock+0x50/0x80 > hrtimer_grab_expiry_lock+0x20/0x30 > hrtimer_cancel+0x13/0x30 > do_nanosleep+0xa0/0x150 > hrtimer_nanosleep+0xe1/0x230 > ? __hrtimer_init_sleeper+0x60/0x60 > __x64_sys_nanosleep+0x8d/0xa0 > do_syscall_64+0x4a/0x100 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x7fa58b52330d > ... > ---[ end trace 0002 ]— > > He also provided a simple reproducer creating the situation below: > > So the execution order of locking steps are the following > (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2 > are mutexes that are enabled * with priority inheritance.) > > Time moves forward as this timeline goes down: > > N1 N2 D1 > | || > | || > Lock(M1)|| > | || > | Lock(M2) | > | || > | | Lock(M2) > | || > | Lock(M1) | > | (!!bug triggered!) | > > Daniel reported a similar situation as well, by just letting ksoftirqd > run with DEADLINE (and eventually block on a mutex). > > Problem is that boosted entities (Priority Inheritance) use static > DEADLINE parameters of the top priority waiter. However, there might be > cases where top waiter could be a non-DEADLINE entity that is currently > boosted by a DEADLINE entity from a different lock chain (i.e., nested > priority chains involving entities of non-DEADLINE classes). In this > case, top waiter static DEADLINE parameters could be null (initialized > to 0 at fork()) and replenish_dl_entity() would hit a BUG(). > > Fix this by keeping track of the original donor and using its parameters > when a task is boosted. > > Reported-by: Glenn Elliott > Reported-by: Daniel Bristot de Oliveira > Signed-off-by: Juri Lelli Tested-by: Daniel Bristot de Oliveira Thanks! -- Daniel
Re: [PATCH v4 00/19] sched: Migrate disable support
On 10/23/20 12:11 PM, Peter Zijlstra wrote: > Hi, > > The fourth version of migrate_disable() for PREEMPT_RT. > > Two changes since last time: > > - fixes !SMP builds (bigeasy) > - TLA+ validation of migrate_disable() vs sched_setaffinity() (valsch) > > Esp. that latter resulted in significant changes to patch #10. Huge thanks to > Valentin. While I will still work on this, testing more and trying to analyze the effects from the (rt) scheduling perspective, I do not see any other issues (kudos to Valentin, impressive review). Feel free to add: Reviewed-by: Daniel Bristot de Oliveira Thanks! -- Daniel
Re: [PATCH] sched/deadline: Fix priority inheritance with multiple scheduling classes
On 11/5/20 5:12 PM, Juri Lelli wrote: > Hi Valentin, > > On 05/11/20 15:49, Valentin Schneider wrote: >> >> Hi Juri, >> >> On 05/11/20 07:50, Juri Lelli wrote: >>> He also provided a simple reproducer creating the situation below: >>> >>> So the execution order of locking steps are the following >>> (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2 >>> are mutexes that are enabled * with priority inheritance.) >>> >>> Time moves forward as this timeline goes down: >>> >>> N1 N2 D1 >>> | || >>> | || >>> Lock(M1)|| >>> | || >>> | Lock(M2) | >>> | || >>> | | Lock(M2) >>> | || >>> | Lock(M1) | >>> | (!!bug triggered!) | >>> >>> Daniel reported a similar situation as well, by just letting ksoftirqd >>> run with DEADLINE (and eventually block on a mutex). >>> >>> Problem is that boosted entities (Priority Inheritance) use static >>> DEADLINE parameters of the top priority waiter. However, there might be >>> cases where top waiter could be a non-DEADLINE entity that is currently >>> boosted by a DEADLINE entity from a different lock chain (i.e., nested >>> priority chains involving entities of non-DEADLINE classes). In this >>> case, top waiter static DEADLINE parameters could be null (initialized >>> to 0 at fork()) and replenish_dl_entity() would hit a BUG(). >>> >> >> IIUC, rt_mutex_get_top_task(N1) == N2, and N2->dl->deadline = 0, which >> makes enqueue_task_dl() unhappy. And that happens because, unlike p->prio, >> DL parameters aren't properly propagated through the chain(s). > > Yep. Replenishment as well, as it's going to find dl_runtime == 0. > >>> Fix this by keeping track of the original donor and using its parameters >>> when a task is boosted. >>> >>> Reported-by: Glenn Elliott >>> Reported-by: Daniel Bristot de Oliveira >>> Signed-off-by: Juri Lelli >>> >>> --- >>> >>> This is actually a v2 attempt (didn't keep $SUBJECT since it's quite >>> different than v1 [1]) to fix this problem. >>> >>> v1 was admittedly pretty ugly. Hope this looks slightly better (even >>> though there is of course overhead associated to the additional >>> pointer). >>> >>> Again, the proper way to fix this is by proxy execution. But, I don't >>> think we are yet there and this problem needs a quick band-aid. >>> >>> One could probably also think to complicate the present approach and >>> actually perform accounting using donor's dynamic parameters, but I fear >>> it would be of little benefit since it would still bring all the >>> problems associated to affinities. So, I'd propose let's try to fix all >>> this properly with proxy and just avoid crashes in the meantime. >>> >> >> For my own sake, what affinity problems are you thinking of? >> >> With proxy exec we have this "funny" dance of shoving the entire blocked-on >> chain on a single runqueue to get the right selection out of >> pick_next_task(), and that needs to deal with affinity (i.e. move the task >> back to a sensible rq once it becomes runnable). >> >> With the current PI, the waiting tasks are blocked and enqueued in the >> pi_waiters tree, so as I see it affinity shouldn't matter; what am I >> missing / not seeing? Is that related to bandwidth handling? > > Think we might break admission control checks if donor and bosted are, > for example, on different exclusive sets of CPUs. Guess that is a > problem with proxy as well, though. As said in the comment above, this > is unfortunately not much more than a band-aid. Hoping we can buy us > some time and fix it properly with proxy. I agree with Juri that the current approach is known to be broken, and that the proxy execution seems to be the mechanisms to go to try to address these problems. However, this will take some time. Meanwhile, this patch that Juri proposes fixes problem in the current mechanism - using the same approach (and breaking in a known way :D). A proper way to handle the priority inversion with a disjoint set of CPUs is something that will also be an issue with proxy execution. But that is an even
Re: [PATCH v6 0/2] sched/deadline: Fix and optimize sched_dl_global_validate()
Hi, On 10/14/20 3:32 PM, Juri Lelli wrote: > Hi, > > On 08/10/20 23:47, Peng Liu wrote: >> When change global rt bandwidth, we check to make sure that new >> settings could accommodate the allocated dl bandwidth. >> >> Under SMP, the dl_bw is on a per root domain basis, currently we check >> and update the new settings one cpu by one cpu, but not in the unit of >> root domain, which is either overdoing or error. >> >> patch 1 removed the superfluous checking and updating >> patch 2 fixed the error validation >> >> For details, please see the corresponding patch. >> >> >> v6 <-- v5: >> - no functional changes, just revert visit_gen back to u64; >> >> v5 <-- v4: >> - no functional changes, just split the v4 single patch to two to >>obey the "one patch do only one thing" rule; >> - turn root_domain::visit_gen from u64 to u32; >>both suggested by Juri. >> - refine changelog; >> >> v4 <-- v3: >> - refine changelog; >> - eliminate the ugly #ifdef guys with Peter's method; >> >> v3 <-- v2: >> - fix build error for !CONFIG_SMP, reported by kernel test robot; >> >> v2 <-- v1: >> - replace cpumask_weight(cpu_rq(cpu)->rd->span) with dl_bw_cpus(cpu), >>suggested by Juri; >> >> Peng Liu (2): >> sched/deadline: Optimize sched_dl_global_validate() >> sched/deadline: Fix sched_dl_global_validate() >> >> kernel/sched/deadline.c | 44 +++ >> kernel/sched/sched.h| 51 ++--- >> kernel/sched/topology.c | 1 + >> 3 files changed, 63 insertions(+), 33 deletions(-) > > These look now good to me. Thanks a lot! > > Acked-by: Juri Lelli I just found a minor thing in a comment: "It's *monotonously* increasing." I tried to find the usage of "monotonously increasing" for monotonic functions, and I did not find it. The (as least most used) term is "monotonically," so the sentence would be like "It is a monotonically increasing value." But it is just a minor thing. Anyways: Reviewed-by: Daniel Bristot de Oliveira Thanks! -- Daniel
[tip: sched/core] sched/deadline: Unthrottle PI boosted threads while enqueuing
The following commit has been merged into the sched/core branch of tip: Commit-ID: feff2e65efd8d84cf831668e182b2ce73c604bbb Gitweb: https://git.kernel.org/tip/feff2e65efd8d84cf831668e182b2ce73c604bbb Author:Daniel Bristot de Oliveira AuthorDate:Wed, 16 Sep 2020 09:06:39 +02:00 Committer: Peter Zijlstra CommitterDate: Sat, 03 Oct 2020 16:30:53 +02:00 sched/deadline: Unthrottle PI boosted threads while enqueuing stress-ng has a test (stress-ng --cyclic) that creates a set of threads under SCHED_DEADLINE with the following parameters: dl_runtime = 1 (10 us) dl_deadline = 10 (100 us) dl_period= 10 (100 us) These parameters are very aggressive. When using a system without HRTICK set, these threads can easily execute longer than the dl_runtime because the throttling happens with 1/HZ resolution. During the main part of the test, the system works just fine because the workload does not try to run over the 10 us. The problem happens at the end of the test, on the exit() path. During exit(), the threads need to do some cleanups that require real-time mutex locks, mainly those related to memory management, resulting in this scenario: Note: locks are rt_mutexes... TASK A: TASK B: TASK C: activation activation activation lock(a): OK!lock(b): OK! lock(a) -> block (task A owns it) -> self notice/set throttled +--< -> arm replenished timer | switch-out | lock(b) | -> B prio> | -> boost TASK B | unlock(a) switch-out | -> handle lock a to B |-> wakeup(B) | -> B is throttled: |-> do not enqueue | switch-out | | +-> replenishment timer -> TASK B is boosted: -> do not enqueue BOOM: TASK B is runnable but !enqueued, holding TASK C: the system crashes with hung task C. This problem is avoided by removing the throttle state from the boosted thread while boosting it (by TASK A in the example above), allowing it to be queued and run boosted. The next replenishment will take care of the runtime overrun, pushing the deadline further away. See the "while (dl_se->runtime <= 0)" on replenish_dl_entity() for more information. Reported-by: Mark Simmons Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Juri Lelli Tested-by: Mark Simmons Link: https://lkml.kernel.org/r/5076e003450835ec74e6fa5917d02c4fa41687e6.1600170294.git.bris...@redhat.com --- kernel/sched/deadline.c | 21 + 1 file changed, 21 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index c19c188..6d93f45 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1525,6 +1525,27 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) */ if (pi_task && dl_prio(pi_task->normal_prio) && p->dl.dl_boosted) { pi_se = _task->dl; + /* +* Because of delays in the detection of the overrun of a +* thread's runtime, it might be the case that a thread +* goes to sleep in a rt mutex with negative runtime. As +* a consequence, the thread will be throttled. +* +* While waiting for the mutex, this thread can also be +* boosted via PI, resulting in a thread that is throttled +* and boosted at the same time. +* +* In this case, the boost overrides the throttle. +*/ + if (p->dl.dl_throttled) { + /* +* The replenish timer needs to be canceled. No +* problem if it fires concurrently: boosted threads +* are ignored in dl_task_timer(). +*/ + hrtimer_try_to_cancel(>dl.dl_timer); + p->dl.dl_throttled = 0; + } } else if (!dl_prio(p->normal_prio)) { /* * Special case in which we have a !SCHED_DEADLINE task that is going
Re: [PATCH] sched/deadline: Unthrottle PI boosted threads while enqueuing
On 9/18/20 8:00 AM, Juri Lelli wrote: > Hi Daniel, > > On 16/09/20 09:06, Daniel Bristot de Oliveira wrote: >> stress-ng has a test (stress-ng --cyclic) that creates a set of threads >> under SCHED_DEADLINE with the following parameters: >> >> dl_runtime = 1 (10 us) >> dl_deadline = 10 (100 us) >> dl_period= 10 (100 us) >> >> These parameters are very aggressive. When using a system without HRTICK >> set, these threads can easily execute longer than the dl_runtime because >> the throttling happens with 1/HZ resolution. >> >> During the main part of the test, the system works just fine because >> the workload does not try to run over the 10 us. The problem happens at >> the end of the test, on the exit() path. During exit(), the threads need >> to do some cleanups that require real-time mutex locks, mainly those >> related to memory management, resulting in this scenario: >> >> Note: locks are rt_mutexes... >> >> TASK A: TASK B: TASK C: >> activation >> activation >> activation >> >> lock(a): OK! lock(b): OK! >> >> lock(a) >> -> block (task A owns it) >>-> self notice/set throttled >> +--< -> arm replenished timer >> | switch-out >> | lock(b) >> | -> B >> prio> >> | -> boost TASK B >> | unlock(a)switch-out >> | -> handle lock a to B >> |-> wakeup(B) >> | -> B is throttled: >> |-> do not enqueue >> | switch-out >> | >> | >> +-> replenishment timer >> -> TASK B is boosted: >>-> do not enqueue >> >> >> BOOM: TASK B is runnable but !enqueued, holding TASK C: the system >> crashes with hung task C. >> >> This problem is avoided by removing the throttle state from the boosted >> thread while boosting it (by TASK A in the example above), allowing it to >> be queued and run boosted. >> >> The next replenishment will take care of the runtime overrun, pushing >> the deadline further away. See the "while (dl_se->runtime <= 0)" on >> replenish_dl_entity() for more information. >> >> Signed-off-by: Daniel Bristot de Oliveira >> Reported-by: Mark Simmons >> Reviewed-by: Juri Lelli >> Tested-by: Mark Simmons >> Cc: Ingo Molnar >> Cc: Peter Zijlstra >> Cc: Juri Lelli >> Cc: Vincent Guittot >> Cc: Dietmar Eggemann >> Cc: Steven Rostedt >> Cc: Ben Segall >> Cc: Mel Gorman >> Cc: Daniel Bristot de Oliveira >> Cc: linux-kernel@vger.kernel.org >> >> --- > > Thanks for this fix. > > Acked-by: Juri Lelli This is a gentle ping... [we are facing this bug in practice :-(]. -- Daniel > Best, > Juri >
[tip: sched/core] sched/rt: Disable RT_RUNTIME_SHARE by default
The following commit has been merged into the sched/core branch of tip: Commit-ID: 2586af1ac187f6b3a50930a4e33497074e81762d Gitweb: https://git.kernel.org/tip/2586af1ac187f6b3a50930a4e33497074e81762d Author:Daniel Bristot de Oliveira AuthorDate:Mon, 21 Sep 2020 16:39:49 +02:00 Committer: Peter Zijlstra CommitterDate: Fri, 25 Sep 2020 14:23:24 +02:00 sched/rt: Disable RT_RUNTIME_SHARE by default The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime between CPUs, allowing a CPU to run a real-time task up to 100% of the time while leaving more space for non-real-time tasks to run on the CPU that lend rt_runtime. The problem is that a CPU can easily borrow enough rt_runtime to allow a spinning rt-task to run forever, starving per-cpu tasks like kworkers, which are non-real-time by design. This patch disables RT_RUNTIME_SHARE by default, avoiding this problem. The feature will still be present for users that want to enable it, though. Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Peter Zijlstra (Intel) Tested-by: Wei Wang Link: https://lkml.kernel.org/r/b776ab46817e3db5d8ef79175fa0d71073c051c7.1600697903.git.bris...@redhat.com --- kernel/sched/features.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 7481cd9..68d369c 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -77,7 +77,7 @@ SCHED_FEAT(WARN_DOUBLE_CLOCK, false) SCHED_FEAT(RT_PUSH_IPI, true) #endif -SCHED_FEAT(RT_RUNTIME_SHARE, true) +SCHED_FEAT(RT_RUNTIME_SHARE, false) SCHED_FEAT(LB_MIN, false) SCHED_FEAT(ATTACH_AGE_LOAD, true)
Re: [ANNOUNCE][CFP] Real-Time Linux Mini-Summit 2020 - Virtual
Hi all! This is a gentle reminder about the CFP deadline for the Real-time Linux Mini-Summit 2020. The CFP will be open until September 30th, 2020, at 23:59 PST. IOW: this Wednesday (in the previous mail I said that Sept 30th was on Saturday, which is not the case (at last, not this year :-)) my bad) Thanks! On 9/18/20 5:17 PM, Daniel Bristot de Oliveira wrote: > The Real-Time Mini-Summit is organized by the Linux Foundation Real-Time Linux > (RTL) collaborative project. The event is intended to gather developers and > users of Linux as a Real-Time Operating System. The main intent is to provide > room for discussion between developers, tooling experts, and users. > > The mini-summit will take place alongside the Open Source Summit + Embedded > Linux Conference Europe 2020, virtually. The summit is planned last day of the > main conference, Friday, October 30th, 2020. > > If you are interested to present, please submit a proposal [1] before > September > 30th, 2020, at 23:59 EST. Please provide a title, an abstract describing the > proposed talk, a short biography, and describe the targeted audience. Please > indicate the slot length you are aiming for: The format is a single track with > presentation slots of 15, 30 minutes long. Considering that the presentation > should leave a space for discussion. > > We are welcoming presentations from both end-users and developers, on topics > covering, but not limited to: > > - Real-time Linux development > - Real-time Linux evaluation > - Real-time Linux use cases (Success and failures) > - Real-time Linux tooling (tracing, configuration, …) > > Those can cover recently available technologies, ongoing work, and new ideas. > > Important Notes for Speakers: > > All speakers are required to adhere to the Linux Foundation events’ Code of > Conduct. We also highly recommend that speakers take the Linux Foundation > online > Inclusive Speaker Orientation Course. Avoid sales or marketing pitches and > discussing unlicensed or potentially closed-source technologies when preparing > your proposal; these talks are almost always rejected due to the fact that > they > take away from the integrity of our events, and are rarely well-received by > conference attendees. > > All accepted speakers are required to submit their slides prior to the event. > > [1] Submission page: https://forms.gle/yQeqyrtJYezM5VRJA > > Important Dates: > > CFP Close: Wednesday, September 30th, 2020, at 23:59 PST > Speaker notification: October 10th, 2020 > Conference: Friday, October 30th, 2020 > > Questions on submitting a proposal? > Email Daniel Bristot de Oliveira >
Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
On 9/24/20 10:27 AM, pet...@infradead.org wrote: > So my current todo list is: > > - Change RT PULL > - Change DL PULL > - Add migrate_disable() tracer; exactly like preempt/irqoff, except >measuring task-runtime instead of cpu-time. > - Add a mode that measures actual interference. > - Add a traceevent to detect preemption in migrate_disable(). > > > And then I suppose I should twist Daniel's arm to update his model to > include these scenarios and numbers. Challenge accepted :-) -- Daniel
Re: [PATCH 7/9] sched: Add migrate_disable()
On 9/23/20 7:08 PM, pet...@infradead.org wrote: > On Wed, Sep 23, 2020 at 10:31:10AM +0200, Thomas Gleixner wrote: >>In practice migrate disable could be taken into account on placement >>decisions, but yes we don't have anything like that at the moment. > I think at the very least we should do some of that. > > The premise is wanting to run the M highest priority tasks, when a CPU > drops priority, it tries to PULL a higher priority task towards itself. > If this PULL selects a migrate_disable() tasks, it means the task is in > the M highest prio tasks. > > Since obviously that task cannot get pulled, we should then pull the > current running task of that CPU, this would then allow the > migrate_disable() task to resume execution. > > I'll go try and cook up something along those lines... > sounds promising... -- Daniel
Re: [PATCH 7/9] sched: Add migrate_disable()
On 9/23/20 10:31 AM, Thomas Gleixner wrote: > On Mon, Sep 21 2020 at 22:42, Daniel Bristot de Oliveira wrote: >> On 9/21/20 9:16 PM, Thomas Gleixner wrote: >>> On Mon, Sep 21 2020 at 18:36, Peter Zijlstra wrote: >>> But seriously, I completely understand your concern vs. schedulability >>> theories, but those theories can neither deal well with preemption >>> disable simply because you can create other trainwrecks when enough low >>> priority tasks run long enough in preempt disabled regions in >>> parallel. The scheduler simply does not know ahead how long these >>> sections will take and how many of them will run in parallel. >>> >>> The theories make some assumptions about preempt disable and consider it >>> as temporary priority ceiling, but that's all assumptions as the bounds >>> of these operations simply unknown. >> >> Limited preemption is something that is more explored/well known than >> limited/arbitrary affinity - I even know a dude that convinced academics >> about >> the effects/properties of preempt disable on the PREEMPT_RT! > > I'm sure I never met that guy. It is a funny Italian/Brazilian dude >> But I think that the message here is that: ok, migrate disable is better for >> the >> "scheduling latency" than preempt disable (preempt rt goal). But the >> indiscriminate usage of migrate disable has some undesired effects for >> "response >> time" of real-time threads (scheduler goal), so we should use it with >> caution - >> as much as we have with preempt disable. In the end, both are critical for >> real-time workloads, and we need more work and analysis on them both. > ... >>> But as the kmap discussion has shown, the current situation of enforcing >>> preempt disable even on a !RT kernel is not pretty either. I looked at >>> quite some of the kmap_atomic() usage sites and the resulting >>> workarounds for non-preemptability are pretty horrible especially if >>> they do copy_from/to_user() or such in those regions. There is tons of >>> other code which really only requires migrate disable >> >> (not having an explicit declaration of the reason to disable preemption make >> these all hard to rework... and we will have the same with migrate disable. >> Anyways, I agree that disabling only migration helps -rt now [and I like >> that]... but I also fear/care for scheduler metrics on the long term... well, >> there is still a long way until retirement.) > > Lets have a look at theory and practice once more: > > 1) Preempt disable > >Theories take that into account by adding a SHC ('Sh*t Happens >Coefficient') into their formulas, but the practical effects cannot >ever be reflected in theories accurately. It depends, an adequate theory will have the correct balance between SHC and precision. The idea is to precisely define the behavior as much as possible, trying to reduce the SHC. >In practice, preempt disable can cause unbound latencies and while we >all agree that long preempt/interrupt disabled sections are bad, it's >not really trivial to break these up without rewriting stuff from >scratch. The recent discussion about unbound latencies in the page >allocator is a prime example for that. Here we need to separate two contexts: the synchronization and the code contexts. At the synchronization level the preempt_disable() is bounded [1]: The PREEMPT_RT can have only *1* preempt_disable()/enable() not to schedule section (the worst (SHC)) before starting the process of calling the scheduler. So the SHC factor is then reduced to the code context [2]. The SHC is in the code context, and it is mitigated by the constant monitoring of the code sections via tests. >The ever growing usage of per CPU storage is not making anything >better and right now preempt disable is the only tool we have at the >moment in mainline to deal with that. > >That forces people to come up with code constructs which are more >than suboptimal both in terms of code quality and in terms of >schedulability/latency. We've seen mutexes converted to spinlocks >just because of that, conditionals depending on execution context >which turns out to be broken and inconsistent, massive error handling >trainwrecks, etc. I see and agree with you on this point. > 2) Migrate disable > >Theories do not know anything about it, but in the very end it's >going to be yet another variant of SHC to be defined. I agree. There are very few things known at this point. However, we can take as exercise the example that Peter mentioned: CPU 0
Re: [PATCH] sched/rt: Disable RT_RUNTIME_SHARE by default
On 9/22/20 7:14 PM, Wei Wang wrote: > On Mon, Sep 21, 2020 at 7:40 AM Daniel Bristot de Oliveira > wrote: >> >> The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime >> between CPUs, allowing a CPU to run a real-time task up to 100% of the >> time while leaving more space for non-real-time tasks to run on the CPU >> that lend rt_runtime. >> >> The problem is that a CPU can easily borrow enough rt_runtime to allow >> a spinning rt-task to run forever, starving per-cpu tasks like kworkers, >> which are non-real-time by design. >> >> This patch disables RT_RUNTIME_SHARE by default, avoiding this problem. >> The feature will still be present for users that want to enable it, >> though. >> >> Signed-off-by: Daniel Bristot de Oliveira >> Cc: Ingo Molnar >> Cc: Peter Zijlstra >> Cc: Juri Lelli >> Cc: Vincent Guittot >> Cc: Dietmar Eggemann >> Cc: Steven Rostedt >> Cc: Ben Segall >> Cc: Mel Gorman >> Cc: Daniel Bristot de Oliveira >> Cc: Thomas Gleixner >> Cc: linux-kernel@vger.kernel.org >> --- >> kernel/sched/features.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/sched/features.h b/kernel/sched/features.h >> index 7481cd96f391..68d369cba9e4 100644 >> --- a/kernel/sched/features.h >> +++ b/kernel/sched/features.h >> @@ -77,7 +77,7 @@ SCHED_FEAT(WARN_DOUBLE_CLOCK, false) >> SCHED_FEAT(RT_PUSH_IPI, true) >> #endif >> >> -SCHED_FEAT(RT_RUNTIME_SHARE, true) >> +SCHED_FEAT(RT_RUNTIME_SHARE, false) >> SCHED_FEAT(LB_MIN, false) >> SCHED_FEAT(ATTACH_AGE_LOAD, true) >> >> -- >> 2.26.2 >> > > Tested on an Android device and can no longer see long running RT > tasks (yes, Android have those for reasons). > So: Tested-by: Wei Wang ? Thanks! -- Daniel
Re: [PATCH 7/9] sched: Add migrate_disable()
On 9/21/20 9:16 PM, Thomas Gleixner wrote: > On Mon, Sep 21 2020 at 18:36, Peter Zijlstra wrote: >> Add the base migrate_disable() support (under protest). > > :) > >> +/* >> + * Migrate-Disable and why it is (strongly) undesired. >> + * >> + * The premise of the Real-Time schedulers we have on Linux >> + * (SCHED_FIFO/SCHED_DEADLINE) is that M CPUs can/will run M tasks >> + * concurrently, provided there are sufficient runnable tasks, also known as >> + * work-conserving. For instance SCHED_DEADLINE tries to schedule the M >> + * earliest deadline threads, and SCHED_FIFO the M highest priority threads. >> + * >> + * The correctness of various scheduling models depends on this, but is it >> + * broken by migrate_disable() that doesn't imply preempt_disable(). Where >> + * preempt_disable() implies an immediate priority ceiling, preemptible >> + * migrate_disable() allows nesting. >> + * >> + * The worst case is that all tasks preempt one another in a >> migrate_disable() >> + * region and stack on a single CPU. This then reduces the available >> bandwidth >> + * to a single CPU. And since Real-Time schedulability theory considers the >> + * Worst-Case only, all Real-Time analysis shall revert to single-CPU >> + * (instantly solving the SMP analysis problem). > > I'm telling you for years that SMP is the source of all evils and > NR_CPUS=0 is the ultimate solution of all problems. Paul surely > disagrees as he thinks that NR_CPUS<0 is the right thing to do. And I would not need to extend the model! > But seriously, I completely understand your concern vs. schedulability > theories, but those theories can neither deal well with preemption > disable simply because you can create other trainwrecks when enough low > priority tasks run long enough in preempt disabled regions in > parallel. The scheduler simply does not know ahead how long these > sections will take and how many of them will run in parallel. > > The theories make some assumptions about preempt disable and consider it > as temporary priority ceiling, but that's all assumptions as the bounds > of these operations simply unknown. Limited preemption is something that is more explored/well known than limited/arbitrary affinity - I even know a dude that convinced academics about the effects/properties of preempt disable on the PREEMPT_RT! But I think that the message here is that: ok, migrate disable is better for the "scheduling latency" than preempt disable (preempt rt goal). But the indiscriminate usage of migrate disable has some undesired effects for "response time" of real-time threads (scheduler goal), so we should use it with caution - as much as we have with preempt disable. In the end, both are critical for real-time workloads, and we need more work and analysis on them both. >> + * The reason we have it anyway. >> + * >> + * PREEMPT_RT breaks a number of assumptions traditionally held. By forcing >> a >> + * number of primitives into becoming preemptible, they would also allow >> + * migration. This turns out to break a bunch of per-cpu usage. To this end, >> + * all these primitives employ migirate_disable() to restore this implicit >> + * assumption. >> + * >> + * This is a 'temporary' work-around at best. The correct solution is >> getting >> + * rid of the above assumptions and reworking the code to employ explicit >> + * per-cpu locking or short preempt-disable regions. > > What timeframe are you envisioning for 'temporary'? I assume something > which is closer to your retirement than to mine :) /me counts how many years he still needs to wait for the retirement. > >> + * The end goal must be to get rid of migrate_disable(), alternatively we >> need >> + * a schedulability theory that does not depend on abritrary migration. > > Finally something new the academics can twist their brain around :) Like if there was not enough already :-) > But as the kmap discussion has shown, the current situation of enforcing > preempt disable even on a !RT kernel is not pretty either. I looked at > quite some of the kmap_atomic() usage sites and the resulting > workarounds for non-preemptability are pretty horrible especially if > they do copy_from/to_user() or such in those regions. There is tons of > other code which really only requires migrate disable (not having an explicit declaration of the reason to disable preemption make these all hard to rework... and we will have the same with migrate disable. Anyways, I agree that disabling only migration helps -rt now [and I like that]... but I also fear/care for scheduler metrics on the long term... well, there is still a long way until retirement.) Thanks! -- Daniel > Thanks, > > tglx >
[PATCH] x86/irq: Use printk_deferred() on raw_spin_lock() protected sections
While testing hotplug I got this BUG: = [ BUG: Invalid wait context ] 5.9.0-rc5+ #3 Not tainted - migration/2/20 is trying to lock: b4315778 (>lock){-.-.}-{3:3}, at: serial8250_console_write+0x8e/0x380 other info that might help us debug this: context-{5:5} 4 locks held by migration/2/20: #0: 91622a3ff4c0 (_desc_lock_class){-.-.}-{2:2}, at: irq_migrate_all_off_this_cpu+0x41/0x2f0 #1: b2c509b8 (vector_lock){-.-.}-{2:2}, at: irq_force_complete_move+0x2a/0x70 #2: b2c7cec0 (console_lock){+.+.}-{0:0}, at: printk+0x48/0x4a #3: b2c7cbe0 (console_owner){}-{0:0}, at: console_unlock+0x1af/0x650 stack backtrace: CPU: 2 PID: 20 Comm: migration/2 Not tainted 5.9.0-rc5+ #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb8 __lock_acquire.cold+0x1ce/0x333 ? stack_trace_save+0x3b/0x50 ? save_trace+0x3f/0x360 lock_acquire+0xbf/0x3e0 ? serial8250_console_write+0x8e/0x380 _raw_spin_lock_irqsave+0x48/0x60 ? serial8250_console_write+0x8e/0x380 serial8250_console_write+0x8e/0x380 ? console_unlock+0x1e8/0x650 console_unlock+0x3f3/0x650 ? printk+0x48/0x4a vprintk_emit+0x1b1/0x440 printk+0x48/0x4a irq_force_complete_move.cold+0xf/0x14 irq_migrate_all_off_this_cpu+0xfa/0x2f0 fixup_irqs+0x25/0xe8 cpu_disable_common+0x2b8/0x2d0 native_cpu_disable+0x18/0x30 take_cpu_down+0x2f/0xa0 multi_cpu_stop+0x6d/0x130 ? stop_machine_yield+0x10/0x10 cpu_stopper_thread+0x7b/0x110 ? smpboot_thread_fn+0x26/0x1e0 smpboot_thread_fn+0x10b/0x1e0 ? smpboot_register_percpu_thread+0xf0/0xf0 kthread+0x13a/0x150 ? kthread_create_worker_on_cpu+0x40/0x40 ret_from_fork+0x22/0x30 smpboot: CPU 2 is now offline = It was caused by printk() inside a code section protected by a raw_spin_lock() that ended up calling a serial console that uses a regular spin_lock(). Use the printk_deferred() to avoid calling the serial console in a raw_spin_lock() protected section. Signed-off-by: Daniel Bristot de Oliveira Suggested-by: Thomas Gleixner Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Marc Zyngier Cc: Peter Xu Cc: Andy Lutomirski Cc: Bjorn Helgaas Cc: x...@kernel.org Cc: linux-kernel@vger.kernel.org --- arch/x86/kernel/apic/vector.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index f8a56b5dc29f..1a0e5535b8ac 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -406,8 +406,9 @@ static int activate_reserved(struct irq_data *irqd) */ if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd), irq_data_get_affinity_mask(irqd))) { - pr_warn("irq %u: Affinity broken due to vector space exhaustion.\n", - irqd->irq); + printk_deferred(KERN_WARNING + "irq %u: Affinity broken due to vector space exhaustion.\n", + irqd->irq); } return ret; @@ -1012,8 +1013,9 @@ void irq_force_complete_move(struct irq_desc *desc) * so we have the necessary information when a problem in that * area arises. */ - pr_warn("IRQ fixup: irq %d move in progress, old vector %d\n", - irqd->irq, vector); + printk_deferred(KERN_WARNING + "IRQ fixup: irq %d move in progress, old vector %d\n", + irqd->irq, vector); } free_moved_vector(apicd); unlock: @@ -1034,15 +1036,17 @@ int lapic_can_unplug_cpu(void) tomove = irq_matrix_allocated(vector_matrix); avl = irq_matrix_available(vector_matrix, true); if (avl < tomove) { - pr_warn("CPU %u has %u vectors, %u available. Cannot disable CPU\n", - cpu, tomove, avl); + printk_deferred(KERN_WARNING + "CPU %u has %u vectors, %u available. Cannot disable CPU\n", + cpu, tomove, avl); ret = -ENOSPC; goto out; } rsvd = irq_matrix_reserved(vector_matrix); if (avl < rsvd) { - pr_warn("Reserved vectors %u > available %u. IRQ request may fail\n", - rsvd, avl); + printk_deferred(KERN_WARNING + "Reserved vectors %u > available %u. IRQ request may fail\n", + rsvd, avl); } out: raw_spin_unlock(_lock); -- 2.26.2
[PATCH] sched/rt: Disable RT_RUNTIME_SHARE by default
The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime between CPUs, allowing a CPU to run a real-time task up to 100% of the time while leaving more space for non-real-time tasks to run on the CPU that lend rt_runtime. The problem is that a CPU can easily borrow enough rt_runtime to allow a spinning rt-task to run forever, starving per-cpu tasks like kworkers, which are non-real-time by design. This patch disables RT_RUNTIME_SHARE by default, avoiding this problem. The feature will still be present for users that want to enable it, though. Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org --- kernel/sched/features.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 7481cd96f391..68d369cba9e4 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -77,7 +77,7 @@ SCHED_FEAT(WARN_DOUBLE_CLOCK, false) SCHED_FEAT(RT_PUSH_IPI, true) #endif -SCHED_FEAT(RT_RUNTIME_SHARE, true) +SCHED_FEAT(RT_RUNTIME_SHARE, false) SCHED_FEAT(LB_MIN, false) SCHED_FEAT(ATTACH_AGE_LOAD, true) -- 2.26.2
[ANNOUNCE][CFP] Real-Time Linux Mini-Summit 2020 - Virtual
The Real-Time Mini-Summit is organized by the Linux Foundation Real-Time Linux (RTL) collaborative project. The event is intended to gather developers and users of Linux as a Real-Time Operating System. The main intent is to provide room for discussion between developers, tooling experts, and users. The mini-summit will take place alongside the Open Source Summit + Embedded Linux Conference Europe 2020, virtually. The summit is planned last day of the main conference, Friday, October 30th, 2020. If you are interested to present, please submit a proposal [1] before September 30th, 2020, at 23:59 EST. Please provide a title, an abstract describing the proposed talk, a short biography, and describe the targeted audience. Please indicate the slot length you are aiming for: The format is a single track with presentation slots of 15, 30 minutes long. Considering that the presentation should leave a space for discussion. We are welcoming presentations from both end-users and developers, on topics covering, but not limited to: - Real-time Linux development - Real-time Linux evaluation - Real-time Linux use cases (Success and failures) - Real-time Linux tooling (tracing, configuration, …) Those can cover recently available technologies, ongoing work, and new ideas. Important Notes for Speakers: All speakers are required to adhere to the Linux Foundation events’ Code of Conduct. We also highly recommend that speakers take the Linux Foundation online Inclusive Speaker Orientation Course. Avoid sales or marketing pitches and discussing unlicensed or potentially closed-source technologies when preparing your proposal; these talks are almost always rejected due to the fact that they take away from the integrity of our events, and are rarely well-received by conference attendees. All accepted speakers are required to submit their slides prior to the event. [1] Submission page: https://forms.gle/yQeqyrtJYezM5VRJA Important Dates: CFP Close: Saturday, September 30th, 2020, at 23:59 PST Speaker notification: October 10th, 2020 Conference: Friday, October 30th, 2020 Questions on submitting a proposal? Email Daniel Bristot de Oliveira
Re: [PATCH] sched/deadline: Fix stale throttling on de-/boosted tasks
On 9/2/20 11:43 AM, pet...@infradead.org wrote: > On Wed, Sep 02, 2020 at 08:00:24AM +0200, Juri Lelli wrote: >> On 31/08/20 13:07, Lucas Stach wrote: >>> When a boosted task gets throttled, what normally happens is that it's >>> immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the >>> runtime and clears the dl_throttled flag. There is a special case however: >>> if the throttling happened on sched-out and the task has been deboosted in >>> the meantime, the replenish is skipped as the task will return to its >>> normal scheduling class. This leaves the task with the dl_throttled flag >>> set. >>> >>> Now if the task gets boosted up to the deadline scheduling class again >>> while it is sleeping, it's still in the throttled state. The normal wakeup >>> however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't >>> actually place it on the rq. Thus we end up with a task that is runnable, >>> but not actually on the rq and neither a immediate replenishment happens, >>> nor is the replenishment timer set up, so the task is stuck in >>> forever-throttled limbo. >>> >>> Clear the dl_throttled flag before dropping back to the normal scheduling >>> class to fix this issue. >>> >>> Signed-off-by: Lucas Stach > >> Acked-by: Juri Lelli I faced a similar issue, but involving DL tasks (not !DL): https://lore.kernel.org/lkml/5076e003450835ec74e6fa5917d02c4fa41687e6.1600170294.git.bris...@redhat.com/ While debugging that problem, I reviewed and tested this patch, and we need it. So: Reviewed-by: Daniel Bristot de Oliveira Thanks! -- Daniel > Thanks! >
[PATCH] sched/deadline: Unthrottle PI boosted threads while enqueuing
stress-ng has a test (stress-ng --cyclic) that creates a set of threads under SCHED_DEADLINE with the following parameters: dl_runtime = 1 (10 us) dl_deadline = 10 (100 us) dl_period= 10 (100 us) These parameters are very aggressive. When using a system without HRTICK set, these threads can easily execute longer than the dl_runtime because the throttling happens with 1/HZ resolution. During the main part of the test, the system works just fine because the workload does not try to run over the 10 us. The problem happens at the end of the test, on the exit() path. During exit(), the threads need to do some cleanups that require real-time mutex locks, mainly those related to memory management, resulting in this scenario: Note: locks are rt_mutexes... TASK A: TASK B: TASK C: activation activation activation lock(a): OK!lock(b): OK! lock(a) -> block (task A owns it) -> self notice/set throttled +--< -> arm replenished timer | switch-out | lock(b) | -> B prio> | -> boost TASK B | unlock(a) switch-out | -> handle lock a to B |-> wakeup(B) | -> B is throttled: |-> do not enqueue | switch-out | | +-> replenishment timer -> TASK B is boosted: -> do not enqueue BOOM: TASK B is runnable but !enqueued, holding TASK C: the system crashes with hung task C. This problem is avoided by removing the throttle state from the boosted thread while boosting it (by TASK A in the example above), allowing it to be queued and run boosted. The next replenishment will take care of the runtime overrun, pushing the deadline further away. See the "while (dl_se->runtime <= 0)" on replenish_dl_entity() for more information. Signed-off-by: Daniel Bristot de Oliveira Reported-by: Mark Simmons Reviewed-by: Juri Lelli Tested-by: Mark Simmons Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: linux-kernel@vger.kernel.org --- kernel/sched/deadline.c | 21 + 1 file changed, 21 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 3862a28cd05d..50ba5fca0403 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1525,6 +1525,27 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) */ if (pi_task && dl_prio(pi_task->normal_prio) && p->dl.dl_boosted) { pi_se = _task->dl; + /* +* Because of delays in the detection of the overrun of a +* thread's runtime, it might be the case that a thread +* goes to sleep in a rt mutex with negative runtime. As +* a consequence, the thread will be throttled. +* +* While waiting for the mutex, this thread can also be +* boosted via PI, resulting in a thread that is throttled +* and boosted at the same time. +* +* In this case, the boost overrides the throttle. +*/ + if (p->dl.dl_throttled) { + /* +* The replenish timer needs to be canceled. No +* problem if it fires concurrently: boosted threads +* are ignored in dl_task_timer(). +*/ + hrtimer_try_to_cancel(>dl.dl_timer); + p->dl.dl_throttled = 0; + } } else if (!dl_prio(p->normal_prio)) { /* * Special case in which we have a !SCHED_DEADLINE task -- 2.26.2
Re: [PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
On 9/4/20 2:36 PM, Ingo Molnar wrote: > Welcome Daniel! :-) > > I've applied the patch to tip:sched/core. Thanks Ingo! :D -- Daniel
[tip: sched/core] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
The following commit has been merged into the sched/core branch of tip: Commit-ID: 153908ebc8b5721658c4aba92779fc6e3e2d5a61 Gitweb: https://git.kernel.org/tip/153908ebc8b5721658c4aba92779fc6e3e2d5a61 Author:Daniel Bristot de Oliveira AuthorDate:Fri, 04 Sep 2020 11:26:23 +02:00 Committer: Ingo Molnar CommitterDate: Fri, 04 Sep 2020 14:35:40 +02:00 MAINTAINERS: Add myself as SCHED_DEADLINE reviewer As discussed with Juri and Peter. Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Juri Lelli Link: https://lore.kernel.org/r/4476a6da70949913a59dab9aacfbd12162c1fbd7.1599146667.git.bris...@redhat.com --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index deaafb6..713f82b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15355,6 +15355,7 @@ R: Dietmar Eggemann (SCHED_NORMAL) R: Steven Rostedt (SCHED_FIFO/SCHED_RR) R: Ben Segall (CONFIG_CFS_BANDWIDTH) R: Mel Gorman (CONFIG_NUMA_BALANCING) +R: Daniel Bristot de Oliveira (SCHED_DEADLINE) L: linux-kernel@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
Re: [PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
On 9/4/20 11:53 AM, Juri Lelli wrote: > Thanks for the help Daniel! My pleasure! Thanks, Juri! -- Daniel
Re: [PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
On 9/4/20 11:39 AM, pet...@infradead.org wrote: > Absolutely, great to have you looking over that! Thanks Peter :-) -- Daniel
[PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
As discussed with Juri and Peter. Signed-off-by: Daniel Bristot de Oliveira Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: linux-kernel@vger.kernel.org --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 592467ba3f4d..56d185210a43 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15363,6 +15363,7 @@ R: Dietmar Eggemann (SCHED_NORMAL) R: Steven Rostedt (SCHED_FIFO/SCHED_RR) R: Ben Segall (CONFIG_CFS_BANDWIDTH) R: Mel Gorman (CONFIG_NUMA_BALANCING) +R: Daniel Bristot de Oliveira (SCHED_DEADLINE) L: linux-kernel@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core -- 2.26.2
Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor
On 8/7/20 4:11 PM, pet...@infradead.org wrote: > But I shelved all that after I heard about that other balancer idea > Danial was suppose to be working on ;-))) The PhD bureaucracy (and behind the scenes) were blocking me... but I am free man now and will catch up on that ;-). [ also because I made enough progress on this: https://github.com/bristot/rtsl/ and can start other things. ] -- Daniel
Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor
On 8/7/20 12:46 PM, pet...@infradead.org wrote: > On Fri, Aug 07, 2020 at 11:56:04AM +0200, Juri Lelli wrote: >> Starting deadline server for lower priority classes right away when >> first task is enqueued might break guarantees, as tasks belonging to >> intermediate priority classes could be uselessly preempted. E.g., a well >> behaving (non hog) FIFO task can be preempted by NORMAL tasks even if >> there are still CPU cycles available for NORMAL tasks to run, as they'll >> be running inside the fair deadline server for some period of time. >> >> To prevent this issue, implement a starvation monitor mechanism that >> starts the deadline server only if a (fair in this case) task hasn't >> been scheduled for some interval of time after it has been enqueued. >> Use pick/put functions to manage starvation monitor status. > One thing I considerd was scheduling this as a least-laxity entity -- > such that it runs late, not early -- and start the server when > rq->nr_running != rq->cfs.h_nr_running, IOW when there's !fair tasks > around. > > Not saying we should do it like that, but that's perhaps more > deterministic than this. > I agree, what we want here is something that schedules the server if it still retains some runtime when the laxity is 0. But this is easier said than done, as this would require another scheduler (other pros and cons and analysis (and hours of work)...). But, for the starvation monitor purpose, the goal is not (necessarily) to provide a deterministic guarantee for the starving task, but to avoid system issues while minimizing the damage to the "real" real-time workload. With that in mind, we could relax our ambitions... Thoughts? -- Daniel
Re: [ANNOUNCE][CFP] Real-Time Micro Conference at LPC 2020
On 7/3/20 6:51 PM, Daniel Bristot de Oliveira wrote: > Proposals must be submitted by August 2nd, and submitters will be notified of > acceptance by August 9th. Because some people asked for more time to work on their proposals, we ended up leaving the submission open during this week, postponing the deadline to August 9th, with notifications not later than August 16th! [Feel free to ping me in case of doubts] Thanks! -- Daniel