[PATCH v10 06/10] drm_print: add choice to use dynamic debug in drm-debug
drm's debug system writes 10 distinct categories of messages to syslog using a small API[1]: drm_dbg*(10 names), DRM_DEV_DEBUG*(3 names), DRM_DEBUG*(8 names). There are thousands of these callsites, each categorized in this systematized way. These callsites can be enabled at runtime by their category, each controlled by a bit in drm.debug (/sys/modules/drm/parameter/debug). In the current "basic" implementation, drm_debug_enabled() tests these bits in __drm_debug each time an API[1] call is executed; while cheap individually, the costs accumulate with uptime. This patch uses dynamic-debug with (required) jump-label to patch enabled callsites onto their respective NOOP slots, avoiding all runtime bit-checks of __drm_debug by drm_debug_enabled(). Dynamic debug has no concept of category, but we can emulate one by replacing enum categories with a set of prefix-strings; "drm:core:", "drm:kms:" "drm:driver:" etc, and prepend them (at compile time) to the given formats. Then we can use: # echo module drm format "^drm:core: " +p > control` to enable the whole category with one query. This conversion yields many new prdbg callsites: dyndbg: 207 debug prints in module drm_kms_helper dyndbg: 376 debug prints in module drm dyndbg: 1811 debug prints in module i915 dyndbg: 3917 debug prints in module amdgpu Each site costs 56 bytes of .data, which is a big increase for drm modules, so CONFIG_DRM_USE_DYNAMIC_DEBUG makes it optional. CONFIG_JUMP_LABEL is also required, to get the promised optimizations. The "basic" -> "dyndbg" switchover is layered into the macro scheme A. A "prefix" version of DRM_UT_ map, named DRM_DBG_CAT_ "basic": DRM_DBG_CAT_ <=== DRM_UT_. Identity map. "dyndbg": #define DRM_DBG_CAT_KMS"drm:kms: " #define DRM_DBG_CAT_PRIME "drm:prime: " #define DRM_DBG_CAT_ATOMIC "drm:atomic: " DRM_UT_* are preserved, since theyre used elsewhere. Since the callback maintains its state in __drm_debug, drm_debug_enabled() will stay synchronized, and continue to work. We can address them separately if they are called enough to be worth fixing. B. drm_dev_dbg() & drm_debug() are interposed with macros basic:forward to renamed fn, with args preserved enabled: redirect to pr_debug, dev_dbg, with CATEGORY format catenated This is where drm_debug_enabled() is avoided. The prefix is prepended at compile-time, no category at runtime. C. API[1] uses DRM_DBG_CAT_s The API already uses B, now it uses A too, instead of DRM_UT_, to get the correct token type for "basic" and "dyndbg" configs. D. use DEFINE_DYNAMIC_DEBUG_CATEGORIES() This defines the map using DRM_CAT_s, and creates the /sysfs bitmap to control those categories. CONFIG_DRM_USE_DYNAMIC_DEBUG is also used to adjust amdgpu, i915 makefiles to add -DDYNAMIC_DEBUG_MODULE; it includes the current CONFIG_DYNAMIC_DEBUG_CORE and is enabled by the user. LIMITATIONS: dev_dbg(etal) effectively prepends twice, category then driver-name, yielding format strings like so: bash-5.1# grep amdgpu: /proc/dynamic_debug/control | grep drm: | cut -d= -f2- _ "amdgpu: drm:core: fence driver on ring %s use gpu addr 0x%016llx\012" _ "amdgpu: drm:kms: Cannot create framebuffer from imported dma_buf\012" This means we cannot use anchored "^drm:kms: " to specify the category, a small loss of precision. Note that searching on "format ^amdgpu: " works, but this is less valuable, because the same can be done with "module amdgpu". NOTES: Because the dyndbg callback is keeping state in __drm_debug, it synchronizes with drm_debug_enabled() and its remaining users; the switchover should be transparent. Code Review is expected to catch the lack of correspondence between bit=>prefix definitions (the selector) and the prefixes used in the API[1] layer above pr_debug() I've coded the categories with trailing spaces. This excludes any sub-categories which might get added later. This convention protects any "drm:atomic:fail:" callsites from getting stomped on by `echo 0 > debug`. Other categories could differ, but we need some default. Dyndbg requires that the prefix be in the compiled-in format string; run-time prefixing evades callsite selection by category. pr_debug("%s: ...", __func__, ...) // not ideal Unfortunately __func__ is not a macro, and cannot be catenated at preprocess/compile time. If you want that, you might consider +mfl flags instead;) Signed-off-by: Jim Cromie --- v5: . use DEFINE_DYNAMIC_DEBUG_CATEGORIES in drm_print.c . s/DRM_DBG_CLASS_/DRM_DBG_CAT_/ - dont need another term . default=y in Kconfig entry - per @DanVet . move some commit-log prose to dyndbg commit . add-prototyes to (param_get/set)_dyndbg . more wrinkles found by . relocate ratelimit chunk from elsewhere v6: . add kernel doc . fix cpp paste, drop '#' v7: . change __drm_debug to long, to fit with DEFINE_DYNAMIC_DEBUG_CATEGORIES . add -DDYNAMIC_DEBUG_MODULE to ccflags if DRM_USE_DYNAMIC_DEBUG v8: . adapt to altered ^ insertion . add mem cos
[PATCH v10 06/10] drm_print: add choice to use dynamic debug in drm-debug
drm's debug system writes 10 distinct categories of messages to syslog using a small API[1]: drm_dbg*(10 names), DRM_DEV_DEBUG*(3 names), DRM_DEBUG*(8 names). There are thousands of these callsites, each categorized in this systematized way. These callsites can be enabled at runtime by their category, each controlled by a bit in drm.debug (/sys/modules/drm/parameter/debug). In the current "basic" implementation, drm_debug_enabled() tests these bits in __drm_debug each time an API[1] call is executed; while cheap individually, the costs accumulate with uptime. This patch uses dynamic-debug with (required) jump-label to patch enabled callsites onto their respective NOOP slots, avoiding all runtime bit-checks of __drm_debug by drm_debug_enabled(). Dynamic debug has no concept of category, but we can emulate one by replacing enum categories with a set of prefix-strings; "drm:core:", "drm:kms:" "drm:driver:" etc, and prepend them (at compile time) to the given formats. Then we can use: # echo module drm format "^drm:core: " +p > control` to enable the whole category with one query. This conversion yields many new prdbg callsites: dyndbg: 207 debug prints in module drm_kms_helper dyndbg: 376 debug prints in module drm dyndbg: 1811 debug prints in module i915 dyndbg: 3917 debug prints in module amdgpu Each site costs 56 bytes of .data, which is a big increase for drm modules, so CONFIG_DRM_USE_DYNAMIC_DEBUG makes it optional. CONFIG_JUMP_LABEL is also required, to get the promised optimizations. The "basic" -> "dyndbg" switchover is layered into the macro scheme A. A "prefix" version of DRM_UT_ map, named DRM_DBG_CAT_ "basic": DRM_DBG_CAT_ <=== DRM_UT_. Identity map. "dyndbg": #define DRM_DBG_CAT_KMS"drm:kms: " #define DRM_DBG_CAT_PRIME "drm:prime: " #define DRM_DBG_CAT_ATOMIC "drm:atomic: " DRM_UT_* are preserved, since theyre used elsewhere. Since the callback maintains its state in __drm_debug, drm_debug_enabled() will stay synchronized, and continue to work. We can address them separately if they are called enough to be worth fixing. B. drm_dev_dbg() & drm_debug() are interposed with macros basic:forward to renamed fn, with args preserved enabled: redirect to pr_debug, dev_dbg, with CATEGORY format catenated This is where drm_debug_enabled() is avoided. The prefix is prepended at compile-time, no category at runtime. C. API[1] uses DRM_DBG_CAT_s The API already uses B, now it uses A too, instead of DRM_UT_, to get the correct token type for "basic" and "dyndbg" configs. D. use DEFINE_DYNAMIC_DEBUG_CATEGORIES() This defines the map using DRM_CAT_s, and creates the /sysfs bitmap to control those categories. CONFIG_DRM_USE_DYNAMIC_DEBUG is also used to adjust amdgpu, i915 makefiles to add -DDYNAMIC_DEBUG_MODULE; it includes the current CONFIG_DYNAMIC_DEBUG_CORE and is enabled by the user. LIMITATIONS: dev_dbg(etal) effectively prepends twice, category then driver-name, yielding format strings like so: bash-5.1# grep amdgpu: /proc/dynamic_debug/control | grep drm: | cut -d= -f2- _ "amdgpu: drm:core: fence driver on ring %s use gpu addr 0x%016llx\012" _ "amdgpu: drm:kms: Cannot create framebuffer from imported dma_buf\012" This means we cannot use anchored "^drm:kms: " to specify the category, a small loss of precision. Note that searching on "format ^amdgpu: " works, but this is less valuable, because the same can be done with "module amdgpu". NOTES: Because the dyndbg callback is keeping state in __drm_debug, it synchronizes with drm_debug_enabled() and its remaining users; the switchover should be transparent. Code Review is expected to catch the lack of correspondence between bit=>prefix definitions (the selector) and the prefixes used in the API[1] layer above pr_debug() I've coded the categories with trailing spaces. This excludes any sub-categories which might get added later. This convention protects any "drm:atomic:fail:" callsites from getting stomped on by `echo 0 > debug`. Other categories could differ, but we need some default. Dyndbg requires that the prefix be in the compiled-in format string; run-time prefixing evades callsite selection by category. pr_debug("%s: ...", __func__, ...) // not ideal Unfortunately __func__ is not a macro, and cannot be catenated at preprocess/compile time. If you want that, you might consider +mfl flags instead;) Signed-off-by: Jim Cromie --- v5: . use DEFINE_DYNAMIC_DEBUG_CATEGORIES in drm_print.c . s/DRM_DBG_CLASS_/DRM_DBG_CAT_/ - dont need another term . default=y in Kconfig entry - per @DanVet . move some commit-log prose to dyndbg commit . add-prototyes to (param_get/set)_dyndbg . more wrinkles found by . relocate ratelimit chunk from elsewhere v6: . add kernel doc . fix cpp paste, drop '#' v7: . change __drm_debug to long, to fit with DEFINE_DYNAMIC_DEBUG_CATEGORIES . add -DDYNAMIC_DEBUG_MODULE to ccflags if DRM_USE_DYNAMIC_DEBUG v8: . adapt to altered ^ insertion . add mem cos