Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
On 2022/1/7 上午11:57, JingWen Chen wrote: > On 2022/1/7 上午3:13, Andrey Grodzovsky wrote: >> On 2022-01-06 12:18 a.m., JingWen Chen wrote: >>> On 2022/1/6 下午12:59, JingWen Chen wrote: On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: > On 2022-01-05 2:59 a.m., Christian König wrote: >> Am 05.01.22 um 08:34 schrieb JingWen Chen: >>> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: On 2022-01-04 6:36 a.m., Christian König wrote: > Am 04.01.22 um 11:49 schrieb Liu, Monk: >> [AMD Official Use Only] >> See the FLR request from the hypervisor is just another source of signaling the need for a reset, similar to each job timeout on each queue. Otherwise you have a race condition between the hypervisor and the scheduler. >> No it's not, FLR from hypervisor is just to notify guest the hw VF >> FLR is about to start or was already executed, but host will do FLR >> anyway without waiting for guest too long >> > Then we have a major design issue in the SRIOV protocol and really > need to question this. > > How do you want to prevent a race between the hypervisor resetting > the hardware and the client trying the same because of a timeout? > > As far as I can see the procedure should be: > 1. We detect that a reset is necessary, either because of a fault a > timeout or signal from hypervisor. > 2. For each of those potential reset sources a work item is send to > the single workqueue. > 3. One of those work items execute first and prepares the reset. > 4. We either do the reset our self or notify the hypervisor that we > are ready for the reset. > 5. Cleanup after the reset, eventually resubmit jobs etc.. > 6. Cancel work items which might have been scheduled from other reset > sources. > > It does make sense that the hypervisor resets the hardware without > waiting for the clients for too long, but if we don't follow this > general steps we will always have a race between the different > components. Monk, just to add to this - if indeed as you say that 'FLR from hypervisor is just to notify guest the hw VF FLR is about to start or was already executed, but host will do FLR anyway without waiting for guest too long' and there is no strict waiting from the hypervisor for IDH_READY_TO_RESET to be recived from guest before starting the reset then setting in_gpu_reset and locking reset_sem from guest side is not really full proof protection from MMIO accesses by the guest - it only truly helps if hypervisor waits for that message before initiation of HW reset. >>> Hi Andrey, this cannot be done. If somehow guest kernel hangs and never >>> has the chance to send the response back, then other VFs will have to >>> wait it reset. All the vfs will hang in this case. Or sometimes the >>> mailbox has some delay and other VFs will also wait. The user of other >>> VFs will be affected in this case. >> Yeah, agree completely with JingWen. The hypervisor is the one in charge >> here, not the guest. >> >> What the hypervisor should do (and it already seems to be designed that >> way) is to send the guest a message that a reset is about to happen and >> give it some time to response appropriately. >> >> The guest on the other hand then tells the hypervisor that all >> processing has stopped and it is ready to restart. If that doesn't >> happen in time the hypervisor should eliminate the guest probably >> trigger even more severe consequences, e.g. restart the whole VM etc... >> >> Christian. > So what's the end conclusion here regarding dropping this particular > patch ? Seems to me we still need to drop it to prevent driver's MMIO > access > to the GPU during reset from various places in the code. > > Andrey > Hi Andrey & Christian, I have ported your patch(drop the reset_sem and in_gpu_reset in flr work) and run some tests. If a engine hang during an OCL benchmark(using kfd), we can see the logs below: >> >> Did you port the entire patchset or just 'drm/amd/virt: Drop concurrent GPU >> reset protection for SRIOV' ? >> >> > I ported the entire patchset [ 397.190727] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.301496] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.406601] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.532343] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.642251] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.746634] amdgpu :00:07.0: amdgpu: w
[PATCH v11 19/19] drm_print: use DEFINE_DYNAMIC_DEBUG_CLASSBITS for drm.debug
if CONFIG_DRM_USE_DYNDBG=y, use new macro to create the sysfs bitmap to control drm.debug callsites. DEFINE_DYNAMIC_DEBUG_CLASSBITS( debug, __drm_debug, "drm.debug - control summary", /* inline vector of dyndbg.class_id (uint:4, 0 is reserved) */ DRM_UT_CORE, DRM_UT_DRIVER, DRM_UT_KMS, DRM_UT_PRIME, DRM_UT_ATOMIC, DRM_UT_VBL, DRM_UT_STATE, DRM_UT_LEASE, DRM_UT_DP, DRM_UT_DRMRES ); NOTES: dyndbg.class_id is uint:4, values 1-15 are valid. 0 is reserved primarily for non-classified callsites (aka: prdbgs), and is thus available to mark the end of the vector (and is added by the macro). RFC: how to fold this in, with multiple macro expansion ? Signed-off-by: Jim Cromie --- drivers/gpu/drm/drm_print.c | 20 ++-- include/drm/drm_print.h | 2 +- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index 0a15a4ec5ead..0de562e5306a 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -41,7 +41,7 @@ * __drm_debug: Enable debug output. * Bitmask of DRM_UT_x. See include/drm/drm_print.h for details. */ -unsigned int __drm_debug; +unsigned long __drm_debug; EXPORT_SYMBOL(__drm_debug); MODULE_PARM_DESC(debug, "Enable debug output, where each bit enables a debug category.\n" @@ -53,7 +53,23 @@ MODULE_PARM_DESC(debug, "Enable debug output, where each bit enables a debug cat "\t\tBit 5 (0x20) will enable VBL messages (vblank code)\n" "\t\tBit 7 (0x80) will enable LEASE messages (leasing code)\n" "\t\tBit 8 (0x100) will enable DP messages (displayport code)"); -module_param_named(debug, __drm_debug, int, 0600); + +#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG) +module_param_named(debug, __drm_debug, ulong, 0600); +#else +DEFINE_DYNAMIC_DEBUG_CLASSBITS(debug, __drm_debug, + "enable drm.debug categories - 1 bit per category", + DRM_UT_CORE, + DRM_UT_DRIVER, + DRM_UT_KMS, + DRM_UT_PRIME, + DRM_UT_ATOMIC, + DRM_UT_VBL, + DRM_UT_STATE, + DRM_UT_LEASE, + DRM_UT_DP, + DRM_UT_DRMRES); +#endif void __drm_puts_coredump(struct drm_printer *p, const char *str) { diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index 6d43b81a0ee4..ebed5ba2dc24 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -36,7 +36,7 @@ #include /* Do *not* use outside of drm_print.[ch]! */ -extern unsigned int __drm_debug; +extern unsigned long __drm_debug; /** * DOC: print -- 2.33.1
[PATCH v11 12/19] drm_print: wrap drm_dbg in _dynamic_func_call_no_desc
Change __drm_dbg() macro to be CONFIG_DRM_USE_DYNAMIC_DEBUG dependent: N- keep straight mapping to exported ___drm_dbg() Y- wrap ___drm_dbg() inside _dynamic_func_call_no_desc() This patch places ~1/2 of drm.debug API messages behind dyndbg's JUMP_LABEL/NOOP optimization. The CONFIG_DRM_USE_DYNAMIC_DEBUG dependence is due to the .data footprint cost of per-callsite control; 56 bytes/site * ~2k,3k callsites (for i915, amdgpu), which is significant enough to make optional. bash-5.1# drms_load [7.489844] dyndbg: 239 debug prints in module drm [7.494010] ACPI: bus type drm_connector registered [7.546076] dyndbg: 81 debug prints in module drm_kms_helper [7.555723] dyndbg: 2 debug prints in module ttm [7.558920] dyndbg: 8 debug prints in module video [8.074699] dyndbg: 431 debug prints in module i915 [8.158682] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug. [8.574456] dyndbg: 3817 debug prints in module amdgpu [8.589962] [drm] amdgpu kernel modesetting enabled. [8.590548] amdgpu: CRAT table not found [8.590998] amdgpu: Virtual CRAT table created for CPU [8.591634] amdgpu: Topology: Add CPU node [8.636446] dyndbg: 3 debug prints in module wmi [8.768667] dyndbg: 24 debug prints in module nouveau Signed-off-by: Jim Cromie --- include/drm/drm_print.h | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index 1eef315a0a65..8d6b74270c50 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -319,10 +319,36 @@ enum drm_debug_category { DRM_UT_DRMRES }; +/* + * 3 name flavors of drm_debug_enabled: + * drm_debug_enabled - public/legacy, always checks bits + * _drm_debug_enabled - instrumented to observe call-rates, est overheads. + * __drm_debug_enabled - privileged - knows jump-label state, can short-circuit + */ static inline bool drm_debug_enabled(enum drm_debug_category category) { return unlikely(__drm_debug & BIT(category)); } +/* + * Wrap fn in macro, so that the pr_debug sees the actual caller, not + * the inline fn. Using this name creates a callsite entry / control + * point in /proc/dynamic_debug/control. + */ +#define _drm_debug_enabled(category) \ + ({ \ + pr_debug("todo: maybe avoid via dyndbg\n"); \ + drm_debug_enabled(category);\ + }) +#if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG) +/* + * dyndbg is wrapping the drm.debug API, so as to avoid the runtime + * bit-test overheads of drm_debug_enabled() in those api calls. + * In this case, executed callsites are known enabled, so true. + */ +#define __drm_debug_enabled(category) true +#else +#define __drm_debug_enabled(category) drm_debug_enabled(category) +#endif /* * struct device based logging @@ -497,7 +523,13 @@ void ___drm_dbg(enum drm_debug_category category, const char *format, ...); __printf(1, 2) void __drm_err(const char *format, ...); +#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG) #define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__) +#else +#define __drm_dbg(eCat, fmt, ...) \ + _dynamic_func_call_no_desc(fmt, ___drm_dbg, \ + eCat, fmt, ##__VA_ARGS__) +#endif /* Macros to make printk easier */ @@ -569,7 +601,7 @@ void __drm_err(const char *format, ...); static DEFINE_RATELIMIT_STATE(rs_, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);\ const struct drm_device *drm_ = (drm); \ \ - if (drm_debug_enabled(DRM_UT_ ## category) && __ratelimit(&rs_)) \ + if (__drm_debug_enabled(DRM_UT_ ## category) && __ratelimit(&rs_)) \ drm_dev_printk(drm_ ? drm_->dev : NULL, KERN_DEBUG, fmt, ## __VA_ARGS__); \ }) -- 2.33.1
[PATCH v11 14/19] drm_print: prefer bare printk KERN_DEBUG on generic fn
drm_print.c calls pr_debug() just once, from __drm_printfn_debug(), which is a generic/service fn. The callsite is compile-time enabled by DEBUG in both DYNAMIC_DEBUG=y/n builds. For dyndbg builds, reverting this callsite back to bare printk is correcting a few anti-features: 1- callsite is generic, serves multiple drm users. its hardwired on currently could accidentally: #> echo -p > /proc/dynamic_debug/control 2- optional "decorations" by dyndbg are unhelpful/misleading they describe only the generic site, not end users IOW, 1,2 are unhelpful at best, and possibly confusing. reverting yields a nominal data and text shrink: textdata bss dec hex filename 462583 36604 54592 553779 87333 /lib/modules/5.16.0-rc4-lm1-8-ged3eac8ceeea/kernel/drivers/gpu/drm/drm.ko 462515 36532 54592 553639 872a7 /lib/modules/5.16.0-rc4-lm1-9-g6ce0b88d2539-dirty/kernel/drivers/gpu/drm/drm.ko NB: this was noticed using _drm_debug_enabled(), added earlier. Signed-off-by: Jim Cromie --- drivers/gpu/drm/drm_print.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index aab29dd6cad1..b911f949af5b 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -23,8 +23,6 @@ * Rob Clark */ -#define DEBUG /* for pr_debug() */ - #include #include @@ -165,7 +163,8 @@ EXPORT_SYMBOL(__drm_printfn_info); void __drm_printfn_debug(struct drm_printer *p, struct va_format *vaf) { - pr_debug("%s %pV", p->prefix, vaf); + /* pr_debug callsite decorations are unhelpful here */ + printk(KERN_DEBUG "%s %pV", p->prefix, vaf); } EXPORT_SYMBOL(__drm_printfn_debug); -- 2.33.1
[PATCH v11 16/19] drm_print: add struct _ddebug desc to drm_*dbg
A recent commit adding trace-events to drm noted: trace_*() additions are strictly redundant with printks to syslog, not properly placed to reduce overall work. That was because it didn't have the struct _ddebug *descriptor, whose .flags specify TRACE and PRINTK actions on a controlled callsite. So it could only duplicate the stream (from the enabled callsites). To issue TRACE, PRINTK selectively: - add struct _ddebug * param to prototypes and functions: ___drm_dbg(), __drm_dev_dbg() add "struct _ddebug;" to name the ptr-type, to silence warning. - adjust the forwarding macros: drm_dbg, drm_dev_dbg to provide a descriptor, or NULL. basically this is dropping the _no_desc_, ie using _dynamic_func_call_cls(), since the callee can now accept it. - add if (desc->flags ...) TRACE / PRINTK actions. Signed-off-by: Jim Cromie --- drivers/gpu/drm/drm_print.c | 25 - include/drm/drm_print.h | 20 ++-- 2 files changed, 26 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index b911f949af5b..8c33302212fc 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -258,8 +258,8 @@ void drm_dev_printk(const struct device *dev, const char *level, } EXPORT_SYMBOL(drm_dev_printk); -void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, - const char *format, ...) +void __drm_dev_dbg(struct _ddebug *desc, const struct device *dev, + enum drm_debug_category category, const char *format, ...) { struct va_format vaf; va_list args; @@ -267,24 +267,31 @@ void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, if (!__drm_debug_enabled(category)) return; + /* we know we are printing for either syslog, tracefs, or both */ va_start(args, format); vaf.fmt = format; vaf.va = &args; if (dev) { - dev_printk(KERN_DEBUG, dev, "[" DRM_NAME ":%ps] %pV", - __builtin_return_address(0), &vaf); - trace_drm_devdbg(dev, category, &vaf); + if (desc->flags & _DPRINTK_FLAGS_PRINTK) + dev_printk(KERN_DEBUG, dev, "[" DRM_NAME ":%ps] %pV", + __builtin_return_address(0), &vaf); + + if (desc->flags & _DPRINTK_FLAGS_TRACE) + trace_drm_devdbg(dev, category, &vaf); } else { - printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV", - __builtin_return_address(0), &vaf); - trace_drm_debug(category, &vaf); + if (desc->flags & _DPRINTK_FLAGS_PRINTK) + printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV", + __builtin_return_address(0), &vaf); + + if (desc->flags & _DPRINTK_FLAGS_TRACE) + trace_drm_debug(category, &vaf); } va_end(args); } EXPORT_SYMBOL(__drm_dev_dbg); -void ___drm_dbg(enum drm_debug_category category, const char *format, ...) +void ___drm_dbg(struct _ddebug *desc, enum drm_debug_category category, const char *format, ...) { struct va_format vaf; va_list args; diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index 0c704610c770..6d43b81a0ee4 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -360,9 +360,9 @@ static inline bool drm_debug_enabled(enum drm_debug_category category) __printf(3, 4) void drm_dev_printk(const struct device *dev, const char *level, const char *format, ...); -__printf(3, 4) -void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, -const char *format, ...); +__printf(4, 5) +void __drm_dev_dbg(struct _ddebug *desc, const struct device *dev, + enum drm_debug_category category, const char *format, ...); /** * DRM_DEV_ERROR() - Error output. @@ -412,11 +412,11 @@ void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, #if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG) #define drm_dev_dbg(dev, eCat, fmt, ...) \ - __drm_dev_dbg(dev, eCat, fmt, ##__VA_ARGS__) + __drm_dev_dbg(NULL, dev, eCat, fmt, ##__VA_ARGS__) #else #define drm_dev_dbg(dev, eCat, fmt, ...) \ - _dynamic_func_call_no_desc_cls(fmt, eCat, __drm_dev_dbg,\ - dev, eCat, fmt, ##__VA_ARGS__) + _dynamic_func_call_cls(eCat, fmt, __drm_dev_dbg,\ + dev, eCat, fmt, ##__VA_ARGS__) #endif /** @@ -519,8 +519,8 @@ void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, * Prefer drm_device based logging over device or prink based logging. */ -__printf(2, 3) -void ___drm_dbg(enum drm_debug_category category, const c
[PATCH v11 18/19] dyndbg: add DEFINE_DYNAMIC_DEBUG_CLASSBITS macro and callbacks
DEFINE_DYNAMIC_DEBUG_CLASSBITS(fsname, var, bitmap_desc, classes..) allows users to create a drm.debug style (bitmap) sysfs interface, to control sets of pr_debug's according to their .class_id's 1st, due to a recent commit, this already works: echo "module drm class 1 +p ; module drm class 3 +p" >control With the macro, this is basically equivalent: # this also turns off all other classes. echo 0x05 > /sys/module/drm/parameters/debug To use: DEFINE_DYNAMIC_DEBUG_CLASSBITS(debug, __drm_debug, "drm.debug - bits => categories:", /* vector of uint:4 symbols, ala enum drm_debug_category */ DRM_UT_CORE, DRM_UT_DRIVER, DRM_UT_KMS ... ); To support the macro, the patch includes: - int param_set_dyndbg_classbits() - int param_get_dyndbg_classbits() - struct kernel_param_ops param_ops_dyndbg_classbits Following the model of kernel/params.c STANDARD_PARAM_DEFS, these are non-static and exported. get/set use an augmented kernel_param; the arg refs a new struct dyndbg_bitmap_param containing: A- the vector of classes (drm.debug "categories") being controlled For CONFIG_DRM_USE_DYNDBG=y, enum drm_debug_category is initialized into (struct _ddebug).class_id, so its already available to select on. B- a pointer to the user module's ulong holding the bits/state. By sharing bit-state in __drm_debug, we coordinate with existing code that still uses drm_debug_enabled(), so they work unchanged. NOTES: param_set_dyndbg_classbits() compares new vs old bits, and only updates each class on changes. This maximally preserves the underlying state, which may have been customized via later `echo $cmd >control`. So if a user really wants to know that all prdbgs are set precisely, they must pre-clear then set. Signed-off-by: Jim Cromie --- include/linux/dynamic_debug.h | 45 - lib/dynamic_debug.c | 75 +++ 2 files changed, 119 insertions(+), 1 deletion(-) diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index e9483cd9ac1c..318ac44a0d4a 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -20,7 +20,9 @@ struct _ddebug { const char *function; const char *filename; const char *format; - unsigned int class_id:4; +#define CLS_BITS 4 +#define DD_MAX_CLASSES (1 << CLS_BITS) + unsigned int class_id:CLS_BITS; unsigned int lineno:18; /* * The flags field controls the behaviour at the callsite. @@ -202,6 +204,10 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, KERN_DEBUG, prefix_str, prefix_type, \ rowsize, groupsize, buf, len, ascii) +struct kernel_param; +int param_set_dyndbg_classbits(const char *instr, const struct kernel_param *kp); +int param_get_dyndbg_classbits(char *buffer, const struct kernel_param *kp); + #else /* !CONFIG_DYNAMIC_DEBUG_CORE */ #include @@ -248,6 +254,43 @@ static inline int dynamic_debug_exec_queries(const char *query, const char *modn return 0; } +struct kernel_param; +static inline int param_set_dyndbg_classbits(const char *instr, const struct kernel_param *kp) +{ return 0; } +static inline int param_get_dyndbg_classbits(char *buffer, const struct kernel_param *kp) +{ return 0; } + #endif /* !CONFIG_DYNAMIC_DEBUG_CORE */ +struct dyndbg_classbits_param { + unsigned long *bits;/* ref to shared state */ + const int classes[];/* indexed by bitpos */ +}; + +#if defined(CONFIG_DYNAMIC_DEBUG) || defined(CONFIG_DYNAMIC_DEBUG_CORE) +/** + * DEFINE_DYNAMIC_DEBUG_CLASSBITS() - bitmap control of classed pr_debugs + * @sysname: sysfs-node name + * @_var:C-identifier holding bit-vector (Bits 0-15 are usable) + * @desc:string summarizing the controls provided + * @classes: vector of callsite.class_id's (uint:4, 0 is reserved) + * + * This macro implements a DRM.debug API style bitmap, mapping bits + * 0-15 to classes of prdbg's, as initialized in their .class_id fields. + */ +#define DEFINE_DYNAMIC_DEBUG_CLASSBITS(fsname, _var, desc, ...) \ + MODULE_PARM_DESC(fsname, desc); \ + static struct dyndbg_classbits_param ddcats_##_var = { \ + .bits = &(_var), .classes = { __VA_ARGS__, 0 } }; \ + module_param_cb(fsname, ¶m_ops_dyndbg_classbits, &ddcats_##_var, 0644) + +extern const struct kernel_param_ops param_ops_dyndbg_classbits; + +#else /* no dyndbg configured, throw error on macro use */ + +#define DEFINE_DYNAMIC_DEBUG_CLASSBITS(fsname, var, bitmap_desc, ...) \ + BUILD_BUG_ON_MSG(1, "CONFIG_DYNAMIC_DEBUG|_CORE && -DDYNAMIC_DEBUG_MODULE needed to use this macro: " #fsname #var) + +#endif + #endif diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c index 419d4664e724..40ca7973a0f8 100644 --- a/lib/dynamic_debug.c +++ b/lib/dynamic_debug.c @@
[PATCH v11 17/19] drm_print: add struct _ddebug *desc to trace-drm-*() params
Replace trace-drm-*()s category param with struct _ddebug *desc; it has .classid field, which is the category. This brings the events closer in line with those added for dyndbg; at least the 1st param, and possibly the struct device (tb-checked). There are still differences in the tail of the prototypes; vaf vs text + len, which probably breaks CLASS sharing. Signed-off-by: Jim Cromie --- drivers/gpu/drm/drm_print.c | 6 +++--- include/trace/events/drm.h | 36 ++-- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index 8c33302212fc..0a15a4ec5ead 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -278,14 +278,14 @@ void __drm_dev_dbg(struct _ddebug *desc, const struct device *dev, __builtin_return_address(0), &vaf); if (desc->flags & _DPRINTK_FLAGS_TRACE) - trace_drm_devdbg(dev, category, &vaf); + trace_drm_devdbg(dev, desc, &vaf); } else { if (desc->flags & _DPRINTK_FLAGS_PRINTK) printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV", __builtin_return_address(0), &vaf); if (desc->flags & _DPRINTK_FLAGS_TRACE) - trace_drm_debug(category, &vaf); + trace_drm_debug(desc, &vaf); } va_end(args); } @@ -306,7 +306,7 @@ void ___drm_dbg(struct _ddebug *desc, enum drm_debug_category category, const ch printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV", __builtin_return_address(0), &vaf); - trace_drm_debug(category, &vaf); + trace_drm_debug(desc, &vaf); va_end(args); } diff --git a/include/trace/events/drm.h b/include/trace/events/drm.h index 944aedaf6aa6..bfe1fff923d8 100644 --- a/include/trace/events/drm.h +++ b/include/trace/events/drm.h @@ -9,25 +9,25 @@ /* drm_debug() was called, pass its args */ TRACE_EVENT(drm_debug, - TP_PROTO(int drm_debug_category, struct va_format *vaf), + TP_PROTO(struct _ddebug *desc, struct va_format *vaf), - TP_ARGS(drm_debug_category, vaf), + TP_ARGS(desc, vaf), TP_STRUCT__entry( - __field(int, drm_debug_category) + __field(struct _ddebug *, desc) __dynamic_array(char, msg, 256) ), TP_fast_assign( int len; + char *p = __get_str(msg); - __entry->drm_debug_category = drm_debug_category; - vsnprintf(__get_str(msg), 256, vaf->fmt, *vaf->va); + __entry->desc = desc; + len = vsnprintf(p, 256, vaf->fmt, *vaf->va); - len = strlen(__get_str(msg)); - if ((len > 0) && (__get_str(msg)[len-1] == '\n')) + if ((len > 0) && (len < 256) && p[len-1] == '\n') len -= 1; - __get_str(msg)[len] = 0; + p[len] = 0; ), TP_printk("%s", __get_str(msg)) @@ -35,30 +35,30 @@ TRACE_EVENT(drm_debug, /* drm_devdbg() was called, pass its args, preserving order */ TRACE_EVENT(drm_devdbg, - TP_PROTO(const struct device *dev, int drm_debug_category, struct va_format *vaf), + TP_PROTO(const struct device *dev, struct _ddebug *desc, struct va_format *vaf), - TP_ARGS(dev, drm_debug_category, vaf), + TP_ARGS(dev, desc, vaf), TP_STRUCT__entry( - __field(const struct device*, dev) - __field(int, drm_debug_category) + __field(const struct device *, dev) + __field(struct _ddebug *, desc) __dynamic_array(char, msg, 256) ), TP_fast_assign( int len; + char *p = __get_str(msg); - __entry->drm_debug_category = drm_debug_category; + __entry->desc = desc; __entry->dev = dev; - vsnprintf(__get_str(msg), 256, vaf->fmt, *vaf->va); + len = vsnprintf(p, 256, vaf->fmt, *vaf->va); - len = strlen(__get_str(msg)); - if ((len > 0) && (__get_str(msg)[len-1] == '\n')) + if ((len > 0) && (len < 256) && p[len-1] == '\n') len -= 1; - __get_str(msg)[len] = 0; + p[len] = 0; ), - TP_printk("cat:%d, %s %s", __entry->drm_debug_category, + TP_printk("cat:%d, %s %s", __entry->desc->class_id, dev_name(__entry->dev), __get_str(msg)) ); -- 2.33.1
[PATCH v11 13/19] drm_print: refine drm_debug_enabled for dyndbg+jump-label
In order to use dynamic-debug's jump-label optimization in drm-debug, its clarifying to refine drm_debug_enabled into 3 uses: 1. drm_debug_enabled - legacy, public 2. __drm_debug_enabled - optimized for dyndbg jump-label enablement. 3. _drm_debug_enabled - pr_debug instrumented, observable 1. The legacy version always checks the bits. 2. is privileged, for use by __drm_dbg(), __drm_dev_dbg(), which do an early return unless the category is enabled (free of call/NOOP side effects). For dyndbg builds, debug callsites are selectively "pre-enabled", so __drm_debug_enabled() short-circuits to true there. Remaining callers of 1 may be able to use 2, case by case. 3. is 1st wrapped in a macro, with a pr_debug, which reports each usage in /proc/dynamic_debug/control, making it observable in the logs. The macro lets the pr_debug see the real caller, not the inline function. When plugged into 1, it identified ~10 remaining callers of the function, leading to the follow-on cleanup patch, and would allow activating the pr_debugs, estimating the callrate, and the potential savings by using the wrapper macro. It is unused ATM, but it fills out the picture. Signed-off-by: Jim Cromie --- drivers/gpu/drm/drm_print.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index 5dd6713c14f2..aab29dd6cad1 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -265,7 +265,7 @@ void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, struct va_format vaf; va_list args; - if (!drm_debug_enabled(category)) + if (!__drm_debug_enabled(category)) return; va_start(args, format); @@ -290,7 +290,7 @@ void ___drm_dbg(enum drm_debug_category category, const char *format, ...) struct va_format vaf; va_list args; - if (!drm_debug_enabled(category)) + if (!__drm_debug_enabled(category)) return; va_start(args, format); -- 2.33.1
[PATCH v11 15/19] drm_print: use _dynamic_func_call_no_desc_cls
Upgrade the current use of _dynamic_func_call_no_desc(), adding the suffix and the category arg. The arg has been available via the macros implementing the drm.debug api, but dyndbg lacked a simple way to represent it and use it until recently. Signed-off-by: Jim Cromie --- include/drm/drm_print.h | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index 8d6b74270c50..0c704610c770 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -31,6 +31,7 @@ #include #include #include +#include #include @@ -414,8 +415,8 @@ void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, __drm_dev_dbg(dev, eCat, fmt, ##__VA_ARGS__) #else #define drm_dev_dbg(dev, eCat, fmt, ...) \ - _dynamic_func_call_no_desc(fmt, __drm_dev_dbg, \ - dev, eCat, fmt, ##__VA_ARGS__) + _dynamic_func_call_no_desc_cls(fmt, eCat, __drm_dev_dbg,\ + dev, eCat, fmt, ##__VA_ARGS__) #endif /** @@ -524,11 +525,11 @@ __printf(1, 2) void __drm_err(const char *format, ...); #if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG) -#define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__) +#define __drm_dbg(fmt, ...)___drm_dbg(NULL, fmt, ##__VA_ARGS__) #else #define __drm_dbg(eCat, fmt, ...) \ - _dynamic_func_call_no_desc(fmt, ___drm_dbg, \ - eCat, fmt, ##__VA_ARGS__) + _dynamic_func_call_no_desc_cls(fmt, eCat, ___drm_dbg, \ + eCat, fmt, ##__VA_ARGS__) #endif /* Macros to make printk easier */ -- 2.33.1
[PATCH v11 11/19] drm_print: wrap drm_dev_dbg in _dynamic_func_call_no_desc
make drm_dev_dbg() macro CONFIG_DRM_USE_DYNAMIC_DEBUG dependent: =N direct forwarding: drm_dev_dbg -> __drm_dev_dbg() =Y wrap __drm_dev_dbg in _dynamic_func_call_no_desc(). This adds the metadata which creates a /proc/dynamic_debug/control entry for each callsite, and exposes it for selective enablement. NB: include header in header so wrapper is available proof of function: bash-5.1# grep 'ring test' /proc/dynamic_debug/control drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:453 [amdgpu]amdgpu_ring_test_helper =xf "ring test on %s succeeded\012" less 452 else 453 DRM_DEV_DEBUG(adev->dev, "ring test on %s succeeded\n", 454 ring->name); drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c lines 417-454/458 byte 12536/12575 100% (press RETURN) new prdbg callsite counts: bash-5.1# drms_load [ 20.533990] dyndbg: 1 debug prints in module drm [ 20.535637] ACPI: bus type drm_connector registered [ 20.624951] dyndbg: 2 debug prints in module ttm [ 20.627853] dyndbg: 8 debug prints in module video [ 20.922442] dyndbg: 127 debug prints in module i915 [ 20.984159] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug. [ 21.310216] dyndbg: 2196 debug prints in module amdgpu [ 21.324537] [drm] amdgpu kernel modesetting enabled. [ 21.325092] amdgpu: CRAT table not found [ 21.325512] amdgpu: Virtual CRAT table created for CPU [ 21.326009] amdgpu: Topology: Add CPU node [ 21.401137] dyndbg: 3 debug prints in module wmi [ 21.532540] dyndbg: 3 debug prints in module nouveau Signed-off-by: Jim Cromie --- include/drm/drm_print.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index f8fa5af11310..1eef315a0a65 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -383,8 +383,14 @@ void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, } \ }) +#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG) #define drm_dev_dbg(dev, eCat, fmt, ...) \ __drm_dev_dbg(dev, eCat, fmt, ##__VA_ARGS__) +#else +#define drm_dev_dbg(dev, eCat, fmt, ...) \ + _dynamic_func_call_no_desc(fmt, __drm_dev_dbg, \ + dev, eCat, fmt, ##__VA_ARGS__) +#endif /** * DRM_DEV_DEBUG() - Debug output for generic drm code -- 2.33.1
[PATCH v11 08/19] drm_print: add trace_drm_dbg, trace_drm_devdbg events
__drm_debug() and __drm_dev_dbg() currently printk to syslog. These 2 underlay the vast bulk of DRM.debug api calls; they are a significant source of debugging info, and could add useful context to debug traces. Wire them to emit 2 new trace_*() events: drm_prdbg and drm_devdbg. These events keep the args of those 2 callers: - int/enum category, va_format *vaf - struct device *dev, int/enum category, va_format *vaf ISTM best to reflect args thru w/o altering; it is simple, least surprising, and preserves info for possible filtering/selecting events. NOTES: trace_*() additions are strictly redundant with printks to syslog, not properly placed to reduce overall work. Reuses trim-trailing-newline trick on vnsprintf TLDR: The event called by __drm_dev_dbg() depends upon !!dev; theres little value to storing a null in the trace. Yes, one could know that devdbg was called with a null, but is that worthwhile ? And if you really needed to know the call (not available from control-file format) the file:line gets you there. Signed-off-by: Jim Cromie --- drivers/gpu/drm/drm_print.c | 13 +-- include/trace/events/drm.h | 68 + 2 files changed, 78 insertions(+), 3 deletions(-) create mode 100644 include/trace/events/drm.h diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index f783d4963d4b..cfcb89ffd89d 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -36,6 +36,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + /* * __drm_debug: Enable debug output. * Bitmask of DRM_UT_x. See include/drm/drm_print.h for details. @@ -269,13 +272,15 @@ void drm_dev_dbg(const struct device *dev, enum drm_debug_category category, vaf.fmt = format; vaf.va = &args; - if (dev) + if (dev) { dev_printk(KERN_DEBUG, dev, "[" DRM_NAME ":%ps] %pV", __builtin_return_address(0), &vaf); - else + trace_drm_devdbg(dev, category, &vaf); + } else { printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV", __builtin_return_address(0), &vaf); - + trace_drm_debug(category, &vaf); + } va_end(args); } EXPORT_SYMBOL(drm_dev_dbg); @@ -295,6 +300,8 @@ void __drm_dbg(enum drm_debug_category category, const char *format, ...) printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV", __builtin_return_address(0), &vaf); + trace_drm_debug(category, &vaf); + va_end(args); } EXPORT_SYMBOL(__drm_dbg); diff --git a/include/trace/events/drm.h b/include/trace/events/drm.h new file mode 100644 index ..944aedaf6aa6 --- /dev/null +++ b/include/trace/events/drm.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM drm + +#if !defined(_TRACE_DRM_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DRM_H + +#include + +/* drm_debug() was called, pass its args */ +TRACE_EVENT(drm_debug, + TP_PROTO(int drm_debug_category, struct va_format *vaf), + + TP_ARGS(drm_debug_category, vaf), + + TP_STRUCT__entry( + __field(int, drm_debug_category) + __dynamic_array(char, msg, 256) + ), + + TP_fast_assign( + int len; + + __entry->drm_debug_category = drm_debug_category; + vsnprintf(__get_str(msg), 256, vaf->fmt, *vaf->va); + + len = strlen(__get_str(msg)); + if ((len > 0) && (__get_str(msg)[len-1] == '\n')) + len -= 1; + __get_str(msg)[len] = 0; + ), + + TP_printk("%s", __get_str(msg)) +); + +/* drm_devdbg() was called, pass its args, preserving order */ +TRACE_EVENT(drm_devdbg, + TP_PROTO(const struct device *dev, int drm_debug_category, struct va_format *vaf), + + TP_ARGS(dev, drm_debug_category, vaf), + + TP_STRUCT__entry( + __field(const struct device*, dev) + __field(int, drm_debug_category) + __dynamic_array(char, msg, 256) + ), + + TP_fast_assign( + int len; + + __entry->drm_debug_category = drm_debug_category; + __entry->dev = dev; + vsnprintf(__get_str(msg), 256, vaf->fmt, *vaf->va); + + len = strlen(__get_str(msg)); + if ((len > 0) && (__get_str(msg)[len-1] == '\n')) + len -= 1; + __get_str(msg)[len] = 0; + ), + + TP_printk("cat:%d, %s %s", __entry->drm_debug_category, + dev_name(__entry->dev), __get_str(msg)) +); + +#endif /* _TRACE_DRM_H */ + +/* This part must be outside protection */ +#include -- 2.33.1
[PATCH v11 10/19] drm_print: interpose drm_dev_dbg, __drm_dbg with forwarding macros
drm_dev_dbg() & _drm_dbg() sit below the categorized layer of the DRM debug API, and thus implement most of it. These are good places to insert dynamic-debug jump-label mechanics, allowing DRM to avoid the runtime cost of drm_debug_enabled(). Set up for this by changing the func names by adding '__' prefixes, and define forwarding macros to the new names. no functional changes. memory cost baseline: (unchanged) bash-5.1# drms_load [9.220389] dyndbg: 1 debug prints in module drm [9.224426] ACPI: bus type drm_connector registered [9.302192] dyndbg: 2 debug prints in module ttm [9.305033] dyndbg: 8 debug prints in module video [9.627563] dyndbg: 127 debug prints in module i915 [9.721505] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug. [ 10.091345] dyndbg: 2196 debug prints in module amdgpu [ 10.106589] [drm] amdgpu kernel modesetting enabled. [ 10.107270] amdgpu: CRAT table not found [ 10.107926] amdgpu: Virtual CRAT table created for CPU [ 10.108398] amdgpu: Topology: Add CPU node [ 10.168507] dyndbg: 3 debug prints in module wmi [ 10.329587] dyndbg: 3 debug prints in module nouveau Signed-off-by: Jim Cromie --- drivers/gpu/drm/drm_print.c | 10 +- include/drm/drm_print.h | 9 +++-- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c index cfcb89ffd89d..5dd6713c14f2 100644 --- a/drivers/gpu/drm/drm_print.c +++ b/drivers/gpu/drm/drm_print.c @@ -259,8 +259,8 @@ void drm_dev_printk(const struct device *dev, const char *level, } EXPORT_SYMBOL(drm_dev_printk); -void drm_dev_dbg(const struct device *dev, enum drm_debug_category category, -const char *format, ...) +void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, + const char *format, ...) { struct va_format vaf; va_list args; @@ -283,9 +283,9 @@ void drm_dev_dbg(const struct device *dev, enum drm_debug_category category, } va_end(args); } -EXPORT_SYMBOL(drm_dev_dbg); +EXPORT_SYMBOL(__drm_dev_dbg); -void __drm_dbg(enum drm_debug_category category, const char *format, ...) +void ___drm_dbg(enum drm_debug_category category, const char *format, ...) { struct va_format vaf; va_list args; @@ -304,7 +304,7 @@ void __drm_dbg(enum drm_debug_category category, const char *format, ...) va_end(args); } -EXPORT_SYMBOL(__drm_dbg); +EXPORT_SYMBOL(___drm_dbg); void __drm_err(const char *format, ...) { diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index b4355bfd7888..f8fa5af11310 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -334,7 +334,7 @@ __printf(3, 4) void drm_dev_printk(const struct device *dev, const char *level, const char *format, ...); __printf(3, 4) -void drm_dev_dbg(const struct device *dev, enum drm_debug_category category, +void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category, const char *format, ...); /** @@ -383,6 +383,9 @@ void drm_dev_dbg(const struct device *dev, enum drm_debug_category category, } \ }) +#define drm_dev_dbg(dev, eCat, fmt, ...) \ + __drm_dev_dbg(dev, eCat, fmt, ##__VA_ARGS__) + /** * DRM_DEV_DEBUG() - Debug output for generic drm code * @@ -484,10 +487,12 @@ void drm_dev_dbg(const struct device *dev, enum drm_debug_category category, */ __printf(2, 3) -void __drm_dbg(enum drm_debug_category category, const char *format, ...); +void ___drm_dbg(enum drm_debug_category category, const char *format, ...); __printf(1, 2) void __drm_err(const char *format, ...); +#define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__) + /* Macros to make printk easier */ #define _DRM_PRINTK(once, level, fmt, ...) \ -- 2.33.1
[PATCH v11 09/19] drm_print: add CONFIG_DRM_USE_DYNAMIC_DEBUG
DRM.debug API is used thousands of times in drivers/gpu/drm/*; when these are implemented using dynamic-debug, DYNAMIC_DEBUG_METADATA adds a struct _ddebug (56 bytes) per site. Since i915 will have ~2k callsites, and amdgpu ~4k, the memory costs are substantial, and thus made configurable. For a baseline: bash-5.1# drms_load [ 10.923093] dyndbg: 1 debug prints in module drm [ 10.928345] ACPI: bus type drm_connector registered [ 11.034012] dyndbg: 2 debug prints in module ttm [ 11.038188] dyndbg: 8 debug prints in module video [ 11.551064] dyndbg: 127 debug prints in module i915 [ 11.617106] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug. [ 12.084856] dyndbg: 2196 debug prints in module amdgpu [ 12.099040] [drm] amdgpu kernel modesetting enabled. [ 12.099790] amdgpu: CRAT table not found [ 12.100523] amdgpu: Virtual CRAT table created for CPU [ 12.100901] amdgpu: Topology: Add CPU node [ 12.165382] dyndbg: 3 debug prints in module wmi [ 12.356424] dyndbg: 3 debug prints in module nouveau i915/gvt has pr_debugs that show up here amdgpu has many pr_debugs from macro expansions Signed-off-by: Jim Cromie fixup-mk --- drivers/gpu/drm/Kconfig | 12 drivers/gpu/drm/Makefile | 2 ++ 2 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 0039df26854b..4e7b865c3441 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -62,6 +62,18 @@ config DRM_DEBUG_MM If in doubt, say "N". +config DRM_USE_DYNAMIC_DEBUG + bool "use dynamic debug to implement drm.debug" + default y + depends on DRM + depends on DYNAMIC_DEBUG || DYNAMIC_DEBUG_CORE + depends on JUMP_LABEL + help + Use dynamic-debug to avoid drm_debug_enabled() runtime overheads. + Due to callsite counts in DRM drivers (~4k in amdgpu) and 56 + bytes per callsite, the .data costs can be substantial, and + are therefore configurable. + config DRM_DEBUG_SELFTEST tristate "kselftests for DRM" depends on DRM diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 0dff40bb863c..60f4d7bc27eb 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -3,6 +3,8 @@ # Makefile for the drm device driver. This driver provides support for the # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher. +CFLAGS-$(CONFIG_DRM_USE_DYNAMIC_DEBUG) += -DDYNAMIC_DEBUG_MODULE + drm-y := drm_aperture.o drm_auth.o drm_cache.o \ drm_file.o drm_gem.o drm_ioctl.o drm_irq.o \ drm_drv.o \ -- 2.33.1
[PATCH v11 07/19] drm_print: condense enum drm_debug_category
enum drm_debug_category has 10 hardcoded values, which could be "simplified" as sequential BIT(x)s. But lets take it one step further, removing the explicit initializations (other than starting at 1), and move the BIT() operation into drm_debug_enabled(). This gives us a more compact representation (4 bits), without loss of info; all DRM.debug api calls pass an enum parameter (and not a bit-OR of them), and the bitmask-iness of the enum's values is merely a micro-optimization to avoid doing BIT(category) at runtime. I doubt the extra bit-shift would be measurable here. And the 4-bit representation means it fits into struct _ddebug.class_id (commit:HEAD~1), setting up for further integration. The enum starts at 1, which respects the "reservation" of 0 as a special case; it is a non-category, and shouldn't get treated like one. Signed-off-by: Jim Cromie --- include/drm/drm_print.h | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h index 22fabdeed297..b4355bfd7888 100644 --- a/include/drm/drm_print.h +++ b/include/drm/drm_print.h @@ -279,49 +279,49 @@ enum drm_debug_category { * @DRM_UT_CORE: Used in the generic drm code: drm_ioctl.c, drm_mm.c, * drm_memory.c, ... */ - DRM_UT_CORE = 0x01, + DRM_UT_CORE = 1, /** * @DRM_UT_DRIVER: Used in the vendor specific part of the driver: i915, * radeon, ... macro. */ - DRM_UT_DRIVER = 0x02, + DRM_UT_DRIVER, /** * @DRM_UT_KMS: Used in the modesetting code. */ - DRM_UT_KMS = 0x04, + DRM_UT_KMS, /** * @DRM_UT_PRIME: Used in the prime code. */ - DRM_UT_PRIME= 0x08, + DRM_UT_PRIME, /** * @DRM_UT_ATOMIC: Used in the atomic code. */ - DRM_UT_ATOMIC = 0x10, + DRM_UT_ATOMIC, /** * @DRM_UT_VBL: Used for verbose debug message in the vblank code. */ - DRM_UT_VBL = 0x20, + DRM_UT_VBL, /** * @DRM_UT_STATE: Used for verbose atomic state debugging. */ - DRM_UT_STATE= 0x40, + DRM_UT_STATE, /** * @DRM_UT_LEASE: Used in the lease code. */ - DRM_UT_LEASE= 0x80, + DRM_UT_LEASE, /** * @DRM_UT_DP: Used in the DP code. */ - DRM_UT_DP = 0x100, + DRM_UT_DP, /** * @DRM_UT_DRMRES: Used in the drm managed resources code. */ - DRM_UT_DRMRES = 0x200, + DRM_UT_DRMRES }; static inline bool drm_debug_enabled(enum drm_debug_category category) { - return unlikely(__drm_debug & category); + return unlikely(__drm_debug & BIT(category)); } /* -- 2.33.1
[PATCH v11 05/19] dyndbg: add desc, dev fields to event record
commit:HEAD~1 added pr_debug(), dev_dbg() params to the new events, but didn't actually capture the params. Do that now; add the other TP_* parts: __fields, fast-assign, and printk elements for the desccriptor and device params. The message capture part is copied from printk:console, it gets the whole message, including dyndbg's prefixing: the dev_name() etc, the optional module:function:line decorations, and the trailing newline (which is trimmed). dyndbg->trace-events must be enabled on both sides: in tracefs: echo 1 > /sys/kernel/tracing/events/dyndbg/enable in dyndbg:echo module drm +T > /proc/dynamic_debug/control This is good; it gives 2 orthogonal cuts at trace traffic, dyndbg can enable callsites indvidually, tracefs can (in principle) filter and trigger on the incoming event stream. ATM, TP_print adds "__entry->desc->{modname,function}", which is redundant with +Tmf enabled callsites. RFC Perhaps the whole decorations/prefix should be excluded from the event capture ? Vincent's skip-past-KERN_DEBUG trick could be extended to skip part or all of the prefix, and leave the "decorating" of events solely to TP_printk. Whats the right separation of concerns ? NB: __entry->desc is a pointer into kernel .data, a pretty stable reference, at least while the kernel is running. Signed-off-by: Jim Cromie --- include/trace/events/dyndbg.h | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/include/trace/events/dyndbg.h b/include/trace/events/dyndbg.h index 82620b10e968..2ac296cb451c 100644 --- a/include/trace/events/dyndbg.h +++ b/include/trace/events/dyndbg.h @@ -14,10 +14,12 @@ TRACE_EVENT(prdbg, TP_ARGS(desc, text, len), TP_STRUCT__entry( + __field(const struct _ddebug *, desc) __dynamic_array(char, msg, len + 1) ), TP_fast_assign( + __entry->desc = desc; /* * Each trace entry is printed in a new line. * If the msg finishes with '\n', cut it off @@ -30,8 +32,8 @@ TRACE_EVENT(prdbg, __get_str(msg)[len] = 0; ), - TP_printk("%s", __get_str(msg)) - + TP_printk("%s.%s %s", __entry->desc->modname, + __entry->desc->function, __get_str(msg)) ); /* capture dev_dbg() callsite descriptor, device, and message */ @@ -42,10 +44,14 @@ TRACE_EVENT(devdbg, TP_ARGS(desc, dev, text, len), TP_STRUCT__entry( + __field(const struct _ddebug *, desc) + __field(const struct device *, dev) __dynamic_array(char, msg, len + 1) ), TP_fast_assign( + __entry->desc = desc; + __entry->dev = (struct device *) dev; /* * Each trace entry is printed in a new line. * If the msg finishes with '\n', cut it off @@ -58,7 +64,8 @@ TRACE_EVENT(devdbg, __get_str(msg)[len] = 0; ), - TP_printk("%s", __get_str(msg)) + TP_printk("%s.%s %s", __entry->desc->modname, + __entry->desc->function, __get_str(msg)) ); #endif /* _TRACE_DYNDBG_H */ -- 2.33.1
[PATCH v11 06/19] dyndbg: add class_id to callsites
DRM defines/uses a 10 enum drm_debug_category to create exclusive classes of debug messages. To support this directly in dynamic-debug, add the following: - struct _ddebug.class_id 4 bits is enough for drm_debug_category and the query support: - struct _ddebug_query.class_id - ddebug_change - looks for "class" keyword - parse_class - accepts uint: 1-15, sets query->class_id - vpr_info_dq - displays new field - ddebug_proc_show - append column with "cls:%d" With this patch, the following cmd is accepted w/o error: #> echo module drm class 2 +p > /proc/dynamic_debug/control It currently does nothing, since no pr_debug callsites can yet be initialized with class_id != 0. Note that this is specifically an error: #> echo module drm class 0 +p > /proc/dynamic_debug/control parse_class() prohibits this, we reserve "class 0" as a non-category. This means that all good inputs are true, and 0 is special. NOTES: Patch extends DEFINE_DYNAMIC_DEBUG_METADATA() with a _CLS suffix, and a new cls param, and re-adds old name using extended name. Then, repeats those mods up through the Factory macros that use the METADATA* .data constructors, so as to actually supply the category into the initializer. CC: Rasmus Villemoes Signed-off-by: Jim Cromie --- .../admin-guide/dynamic-debug-howto.rst | 11 + include/linux/dynamic_debug.h | 41 +-- lib/dynamic_debug.c | 30 -- 3 files changed, 67 insertions(+), 15 deletions(-) diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst index 3c8f104a143f..7e1322660b10 100644 --- a/Documentation/admin-guide/dynamic-debug-howto.rst +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -35,6 +35,7 @@ Dynamic debug has even more useful features: - line number (including ranges of line numbers) - module name - format string + - class number:1-15 * Provides a debugfs control file: ``/dynamic_debug/control`` which can be read to display the complete list of known debug @@ -143,6 +144,7 @@ against. Possible keywords are::: 'module' string | 'format' string | 'line' line-range +'class' integer:[1-15] line-range ::= lineno | '-'lineno | @@ -217,6 +219,15 @@ line line -1605 // the 1605 lines from line 1 to line 1605 line 1600- // all lines from line 1600 to the end of the file +class +This expects a single integer, in range: 1-15. +Using this specification constrains to exact class_id matches. +This is intended for DRM.debug enumerated categories, plus 1. +class_id = 0 is reserved, returns an error currently. + +This differs from the others primarily in that it is not displayed +in the control file currently, add later as an extra column. + The flags specification comprises a change operation followed by one or more flag characters. The change operation is one of the characters:: diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index e0c2d7c0938b..e9483cd9ac1c 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -20,6 +20,7 @@ struct _ddebug { const char *function; const char *filename; const char *format; + unsigned int class_id:4; unsigned int lineno:18; /* * The flags field controls the behaviour at the callsite. @@ -91,7 +92,7 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, const struct ib_device *ibdev, const char *fmt, ...); -#define DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt) \ +#define DEFINE_DYNAMIC_DEBUG_METADATA_CLS(name, cls, fmt) \ static struct _ddebug __aligned(8) \ __section("__dyndbg") name = { \ .modname = KBUILD_MODNAME, \ @@ -100,9 +101,13 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, .format = (fmt),\ .lineno = __LINE__, \ .flags = _DPRINTK_FLAGS_DEFAULT,\ + .class_id = cls,\ _DPRINTK_KEY_INIT \ } +#define DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt) \ + DEFINE_DYNAMIC_DEBUG_METADATA_CLS(name, 0, fmt) + #ifdef CONFIG_JUMP_LABEL #ifdef DEBUG @@ -132,17 +137,21 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, #endif /* CONFIG_JUMP_LABEL */ -#define __dynamic_func_call(id, fmt, func, ...) do { \ - DEFINE_DYNAMIC_DEBUG_METADATA(id, fmt); \ - if (DYNAMIC_DEBUG_BRANCH(id)) \ - func(&id, ##__VA_ARGS__); \ +#def
[PATCH v11 04/19] dyndbg: add trace-events for pr_debug, dev_dbg
In commit HEAD~1, pr_debug temporarily issued a printk:console event. Replace that use with 2 new events: bash-5.1# ls events/dyndbg/ devdbg enable filter prdbg ..called from pr_debug(), dev_dbg() respectively for dyndbg builds. This links the old pr_debug API to tracefs, via dyndbg, allowing pr_debug()s etc to add just a little more user-context to the trace-logs, and then at users option, less syslog. The 2 trace_* calls accept their caller's args respectively, keeping the available info w/o alteration; ISTM this is the best basis to add filtering later. 1- struct _ddebug *descriptor, giving tracefs all of dyndbg's info. 2- struct device *dev, for trace_devdbg(), if !!dev To pass the descriptor into the trace_*() calls, the callchain up to __dynamic_{pr_debug,{dev,netdev,ibdev}_dbg} are altered to provide it. The 2nd event also gets the struct device *dev if !!dev, otherwise a trace_prdbg() is issued, since we have an event for that, and don't need to waste the trace-buffer space. Signed-off-by: Jim Cromie --- include/trace/events/dyndbg.h | 67 +++ lib/dynamic_debug.c | 53 ++- 2 files changed, 96 insertions(+), 24 deletions(-) create mode 100644 include/trace/events/dyndbg.h diff --git a/include/trace/events/dyndbg.h b/include/trace/events/dyndbg.h new file mode 100644 index ..82620b10e968 --- /dev/null +++ b/include/trace/events/dyndbg.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dyndbg + +#if !defined(_TRACE_DYNDBG_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DYNDBG_H + +#include + +/* capture pr_debug() callsite descriptor and message */ +TRACE_EVENT(prdbg, + TP_PROTO(const struct _ddebug *desc, const char *text, size_t len), + + TP_ARGS(desc, text, len), + + TP_STRUCT__entry( + __dynamic_array(char, msg, len + 1) + ), + + TP_fast_assign( + /* +* Each trace entry is printed in a new line. +* If the msg finishes with '\n', cut it off +* to avoid blank lines in the trace. +*/ + if ((len > 0) && (text[len-1] == '\n')) + len -= 1; + + memcpy(__get_str(msg), text, len); + __get_str(msg)[len] = 0; + ), + + TP_printk("%s", __get_str(msg)) + +); + +/* capture dev_dbg() callsite descriptor, device, and message */ +TRACE_EVENT(devdbg, + TP_PROTO(const struct _ddebug *desc, const struct device *dev, +const char *text, size_t len), + + TP_ARGS(desc, dev, text, len), + + TP_STRUCT__entry( + __dynamic_array(char, msg, len + 1) + ), + + TP_fast_assign( + /* +* Each trace entry is printed in a new line. +* If the msg finishes with '\n', cut it off +* to avoid blank lines in the trace. +*/ + if ((len > 0) && (text[len-1] == '\n')) + len -= 1; + + memcpy(__get_str(msg), text, len); + __get_str(msg)[len] = 0; + ), + + TP_printk("%s", __get_str(msg)) +); + +#endif /* _TRACE_DYNDBG_H */ + +/* This part must be outside protection */ +#include diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c index 3fd035cd4653..2efdc4f1553f 100644 --- a/lib/dynamic_debug.c +++ b/lib/dynamic_debug.c @@ -36,7 +36,9 @@ #include #include #include -#include + +#define CREATE_TRACE_POINTS +#include #include @@ -648,7 +650,8 @@ struct dynamic_trace_bufs { static DEFINE_PER_CPU(struct dynamic_trace_bufs, dynamic_trace_bufs); static DEFINE_PER_CPU(int, dynamic_trace_reserve); -static void dynamic_trace(const char *fmt, va_list args) +static void dynamic_trace(struct _ddebug *desc, const struct device *dev, + const char *fmt, va_list args) { struct dynamic_trace_buf *buf; int bufidx; @@ -667,7 +670,11 @@ static void dynamic_trace(const char *fmt, va_list args) buf = this_cpu_ptr(dynamic_trace_bufs.bufs) + bufidx; len = vscnprintf(buf->buf, sizeof(buf->buf), fmt, args); - trace_console(buf->buf, len); + + if (dev == NULL) + trace_prdbg(desc, buf->buf, len); + else + trace_devdbg(desc, dev, buf->buf, len); out: /* As above. */ @@ -676,9 +683,9 @@ static void dynamic_trace(const char *fmt, va_list args) preempt_enable_notrace(); } -static void dynamic_printk(unsigned int flags, const char *fmt, ...) +static void dynamic_printk(struct _ddebug *desc, const char *fmt, ...) { - if (flags & _DPRINTK_FLAGS_TRACE) { + if (desc->flags & _DPRINTK_FLAGS_TRACE) { va_list args; va_start(args, fmt); @@ -686,11 +693,11 @@ static void dynamic_printk(unsigned int flags, const char *fmt, ...)
[PATCH v11 03/19] dyndbg: add write-to-tracefs code
adds: dynamic_trace() uses trace_console() temporarily to issue printk:console event uses internal-ish __ftrace_trace_stack code: 4-context buffer stack, barriers per Steve call it from new funcs: dynamic_printk() - print to both syslog/tracefs dynamic_dev_printk() - dev-print to both syslog/tracefs These handle both _DPRINTK_FLAGS_PRINTK and _DPRINTK_FLAGS_TRACE cases, allowing to vsnprintf the message once and use it for both, skipping past the KERN_DEBUG character for tracing. Finally, adjust the callers: __dynamic_{pr_debug,{,net,ib}dev_dbg}, replacing printk and dev_printk with the new funcs above. The _DPRINTK_FLAGS_TRACE flag character s 'T', so the following finds all callsites enabled for tracing: grep -P =p?T /proc/dynamic_debug/control Enabling debug-to-tracefs is 2 steps: # event enable echo 1 > /sys/kernel/tracing/events/dyndbg/enable # callsite enable echo module foo +T > /proc/dynamic_debug/control This patch,~1,~2 are based upon: https://lore.kernel.org/lkml/20200825153338.17061-1-vincent.whitchu...@axis.com/ .. with simplification of temporarily reusing trace_console() rather than adding a new printk:dyndbg event. Soon, add 2 new events capturing the pr_debug & dev_dbg() args. CC: vincent.whitchu...@axis.com Signed-off-by: Jim Cromie --- .../admin-guide/dynamic-debug-howto.rst | 1 + lib/dynamic_debug.c | 155 +++--- 2 files changed, 130 insertions(+), 26 deletions(-) diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst index a89cfa083155..3c8f104a143f 100644 --- a/Documentation/admin-guide/dynamic-debug-howto.rst +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -228,6 +228,7 @@ of the characters:: The flags are:: penables the pr_debug() callsite. + Tenables callsite to issue a dyndbg:* trace-event fInclude the function name in the printed message lInclude line number in the printed message mInclude module name in the printed message diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c index 78a2912976c1..3fd035cd4653 100644 --- a/lib/dynamic_debug.c +++ b/lib/dynamic_debug.c @@ -36,6 +36,7 @@ #include #include #include +#include #include @@ -87,6 +88,7 @@ static inline const char *trim_prefix(const char *path) static struct { unsigned flag:8; char opt_char; } opt_array[] = { { _DPRINTK_FLAGS_PRINTK, 'p' }, + { _DPRINTK_FLAGS_TRACE, 'T' }, { _DPRINTK_FLAGS_INCL_MODNAME, 'm' }, { _DPRINTK_FLAGS_INCL_FUNCNAME, 'f' }, { _DPRINTK_FLAGS_INCL_LINENO, 'l' }, @@ -628,6 +630,96 @@ static inline char *dynamic_emit_prefix(struct _ddebug *desc, char *buf) return buf; } +/* + * This code is heavily based on __ftrace_trace_stack(). + * + * Allow 4 levels of nesting: normal, softirq, irq, NMI. + */ +#define DYNAMIC_TRACE_NESTING 4 + +struct dynamic_trace_buf { + char buf[256]; +}; + +struct dynamic_trace_bufs { + struct dynamic_trace_buf bufs[DYNAMIC_TRACE_NESTING]; +}; + +static DEFINE_PER_CPU(struct dynamic_trace_bufs, dynamic_trace_bufs); +static DEFINE_PER_CPU(int, dynamic_trace_reserve); + +static void dynamic_trace(const char *fmt, va_list args) +{ + struct dynamic_trace_buf *buf; + int bufidx; + int len; + + preempt_disable_notrace(); + + bufidx = __this_cpu_inc_return(dynamic_trace_reserve) - 1; + + if (WARN_ON_ONCE(bufidx > DYNAMIC_TRACE_NESTING)) + goto out; + + /* For the same reasons as in __ftrace_trace_stack(). */ + barrier(); + + buf = this_cpu_ptr(dynamic_trace_bufs.bufs) + bufidx; + + len = vscnprintf(buf->buf, sizeof(buf->buf), fmt, args); + trace_console(buf->buf, len); + +out: + /* As above. */ + barrier(); + __this_cpu_dec(dynamic_trace_reserve); + preempt_enable_notrace(); +} + +static void dynamic_printk(unsigned int flags, const char *fmt, ...) +{ + if (flags & _DPRINTK_FLAGS_TRACE) { + va_list args; + + va_start(args, fmt); + /* +* All callers include the KERN_DEBUG prefix to keep the +* vprintk case simple; strip it out for tracing. +*/ + dynamic_trace(fmt + strlen(KERN_DEBUG), args); + va_end(args); + } + + if (flags & _DPRINTK_FLAGS_PRINTK) { + va_list args; + + va_start(args, fmt); + vprintk(fmt, args); + va_end(args); + } +} + +static void dynamic_dev_printk(unsigned int flags, const struct device *dev, + const char *fmt, ...) +{ + + if (flags & _DPRINTK_FLAGS_TRACE) { + va_list args; + + va_start(args, fmt); + dynamic_trace(fmt, args); + va_end(args); + } + + if (flags & _DPRINTK_FLAGS_PRINTK) { +
[PATCH v11 02/19] dyndbg: add _DPRINTK_FLAGS_TRACE
add new flag, and OR it into _DPRINTK_FLAGS_ENABLED definition CC: vincent.whitchu...@axis.com Signed-off-by: Jim Cromie --- include/linux/dynamic_debug.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 257a7bcbbe36..e0c2d7c0938b 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -37,7 +37,9 @@ struct _ddebug { (_DPRINTK_FLAGS_INCL_MODNAME | _DPRINTK_FLAGS_INCL_FUNCNAME |\ _DPRINTK_FLAGS_INCL_LINENO | _DPRINTK_FLAGS_INCL_TID) -#define _DPRINTK_FLAGS_ENABLED _DPRINTK_FLAGS_PRINTK +#define _DPRINTK_FLAGS_TRACE (1<<5) +#define _DPRINTK_FLAGS_ENABLED (_DPRINTK_FLAGS_PRINTK | \ +_DPRINTK_FLAGS_TRACE) #if defined DEBUG #define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK -- 2.33.1
[PATCH v11 01/19] dyndbg: add _DPRINTK_FLAGS_ENABLED
Distinguish the condition: _DPRINTK_FLAGS_ENABLED from the bit: _DPRINTK_FLAGS_PRINT, in preparation to add _DPRINTK_FLAGS_TRACE next. Also add a 'K' to get _DPRINTK_FLAGS_PRINTK, to insure is not used elsewhere with a stale meaning. CC: vincent.whitchu...@axis.com Signed-off-by: Jim Cromie --- include/linux/dynamic_debug.h | 10 ++ lib/dynamic_debug.c | 8 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index dce631e678dd..257a7bcbbe36 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -27,7 +27,7 @@ struct _ddebug { * writes commands to /dynamic_debug/control */ #define _DPRINTK_FLAGS_NONE0 -#define _DPRINTK_FLAGS_PRINT (1<<0) /* printk() a message using the format */ +#define _DPRINTK_FLAGS_PRINTK (1<<0) /* printk() a message using the format */ #define _DPRINTK_FLAGS_INCL_MODNAME(1<<1) #define _DPRINTK_FLAGS_INCL_FUNCNAME (1<<2) #define _DPRINTK_FLAGS_INCL_LINENO (1<<3) @@ -37,8 +37,10 @@ struct _ddebug { (_DPRINTK_FLAGS_INCL_MODNAME | _DPRINTK_FLAGS_INCL_FUNCNAME |\ _DPRINTK_FLAGS_INCL_LINENO | _DPRINTK_FLAGS_INCL_TID) +#define _DPRINTK_FLAGS_ENABLED _DPRINTK_FLAGS_PRINTK + #if defined DEBUG -#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINT +#define _DPRINTK_FLAGS_DEFAULT _DPRINTK_FLAGS_PRINTK #else #define _DPRINTK_FLAGS_DEFAULT 0 #endif @@ -120,10 +122,10 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, #ifdef DEBUG #define DYNAMIC_DEBUG_BRANCH(descriptor) \ - likely(descriptor.flags & _DPRINTK_FLAGS_PRINT) + likely(descriptor.flags & _DPRINTK_FLAGS_ENABLED) #else #define DYNAMIC_DEBUG_BRANCH(descriptor) \ - unlikely(descriptor.flags & _DPRINTK_FLAGS_PRINT) + unlikely(descriptor.flags & _DPRINTK_FLAGS_ENABLED) #endif #endif /* CONFIG_JUMP_LABEL */ diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c index dd7f56af9aed..78a2912976c1 100644 --- a/lib/dynamic_debug.c +++ b/lib/dynamic_debug.c @@ -86,7 +86,7 @@ static inline const char *trim_prefix(const char *path) } static struct { unsigned flag:8; char opt_char; } opt_array[] = { - { _DPRINTK_FLAGS_PRINT, 'p' }, + { _DPRINTK_FLAGS_PRINTK, 'p' }, { _DPRINTK_FLAGS_INCL_MODNAME, 'm' }, { _DPRINTK_FLAGS_INCL_FUNCNAME, 'f' }, { _DPRINTK_FLAGS_INCL_LINENO, 'l' }, @@ -210,10 +210,10 @@ static int ddebug_change(const struct ddebug_query *query, if (newflags == dp->flags) continue; #ifdef CONFIG_JUMP_LABEL - if (dp->flags & _DPRINTK_FLAGS_PRINT) { - if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT)) + if (dp->flags & _DPRINTK_FLAGS_ENABLED) { + if (!(modifiers->flags & _DPRINTK_FLAGS_ENABLED)) static_branch_disable(&dp->key.dd_key_true); - } else if (modifiers->flags & _DPRINTK_FLAGS_PRINT) + } else if (modifiers->flags & _DPRINTK_FLAGS_ENABLED) static_branch_enable(&dp->key.dd_key_true); #endif dp->flags = newflags; -- 2.33.1
[PATCH v11 00/19] dyndbg & drm.debug to tracefs
hi Jason, Steve, Greg, DRM-folks, This patchset plumbs debug streams, from dynamic-debug, and from drm.debug, into tracefs. Enabling traffic is done on both source & destination: # enable at source echo module usbcore +T > /proc/dynamic_debug/control # enable events into tracefs echo 1 > /sys/kernel/tracing/events/dyndbg/enable # enable at source echo 0x03 > /sys/module/drm/parameters/debug # enable events into tracefs echo 1 > /sys/kernel/tracing/events/drm/enable This allows selectivity at the sources, and in principle, filtering at tracefs (which is unaddressed, except for arg-passthru). Here is v11, it differs subsantially from v10: A: All tracefs action is via 4 new trace-events: from dyndbg: pr_debug() sends trace_prdbg() dev_dbg() sends trace_devdbg() both preserve args unchanged similarly from drm: drm_dev_dbg() -> trace_drm_devdbg() drm_dbg() -> trace_drm_dbg() again, args are captured unchanged. for some reason 3 other drm_vblank* events showed up, I dont know why. These 4 events (not counting vblank) all capture the args unchanged; ISTM full exposure of available info is best for filtering/triggering purposes. B: dynamic-debug gets proper .class_id, and query support (dyndbg:) so that this is legal input: echo module drm class 3 +T > /proc/dynamic_debug/control v10 used "format drm:core:", which worked, but required addition of category string prefixes, and possible user-facing-changes issues. New field is uint:4, big enough to fit DRMs enum drm_debug_category (once it has been condensed). The name is .class_id, distinct from but related to DRMs "category". This also includes _CLS name & arg extensions of the Factory macros that implement dyndbgs jump-label/NOOP optimizations. C: integration of dyndbg into drm.debug (drm_print:) The purpose here (and originally) is to avoid drm_debug_enabled() runtime costs, and to put a more flexible substrate underneath the sysfs bitmap api. Ive made it CONFIG dependent, since each _ddebug is 56 bytes, and i915 & amdgpu have ~1700 & ~3800 callsites respectively, of which 127 & ~2k are plain pr_debugs. 1. We shrink enum drm_debug_category to fit in 4 bits, at nominal cost of BIT(category) at runtime, which dyndbg will avoid anyway. 2. Add the trace_drm_*dbg() events 3. insert macro indirection, and use it to wrap drm_*dbg()s in dyndbg's _no_desc_ Factory macro. 4. add __drm_debug_enabled (optimized to true) to use behind jumplabel. 5. use _CLS extension of _no_desc_ Factory macro this makes cls available to initialize _ddebug.class_id 6. alter drm_*dbg, replacing category with struct _ddebug *desc. desc.class_id is category desc.flags allows selection of PRINTK or TRACE or both 7. propagate struct _ddebug *desc thru trace_drm_*dbg() make all of _ddebug available for filtering 8. add sysfs bitmap to dyndbg, use it to implement drm.debug D: The "flight-recorder" special instance was unneeded, and is gone: this handles it generically: mkdir instances/flightrec echo 1 > instances/flightrec/events/drm/enable echo module autopilot +T >/proc/dynamic_debug/control v10 is here: https://lore.kernel.org/lkml/20211105192637.2370737-1-jim.cro...@gmail.com/ patches 1-3 are from: https://lore.kernel.org/lkml/20200721141105.16034-1-vincent.whitchu...@axis.com/ this patchset addresses goals of: https://patchwork.freedesktop.org/series/78133/ https://lore.kernel.org/lkml/3706af20bc64a320ff8f3ff8950738b988f4bdf5.1636452784.git.quic_saipr...@quicinc.com/ Jim Cromie (19): 1st 3 are basically direct from execpt I dropped his printk:dyndbg event: dyndbg: add _DPRINTK_FLAGS_ENABLED dyndbg: add _DPRINTK_FLAGS_TRACE dyndbg: add write-to-tracefs code add 2 events, and record args (could be squashed) dyndbg: add trace-events for pr_debug, dev_dbg dyndbg: add desc, dev fields to event record add field, selection mechanisms, and CLS extensions to Factory macros dyndbg: add class_id to callsites make category fit in .class_id: drm_print: condense enum drm_debug_category repeat trace event addition: drm_print: add trace_drm_dbg, trace_drm_devdbg events kconfig+Make-flag: drm_print: add CONFIG_DRM_USE_DYNDBG macro indirection: drm_print: interpose drm_dev_dbg, __drm_dbg with forwarding macros add >control entries for ~4660 drm.debug callsites: drm_print: wrap drm_dev_dbg in _dynamic_func_call_no_desc drm_print: wrap drm_dbg in _dynamic_func_call_no_desc prep: drm_print: refine drm_debug_enabled for dyndbg+jump-label drive-by: drm_print: prefer bare printk KERN_DEBUG on generic fn get .class_id initialized at compile. drm_print: use _dynamic_func_call_no_desc_cls need this to selectively trace/print: drm_print: add struct _ddebug desc to drm_dbg, drm_dev_dbg propagate arg upgrade of HEAD~1 into trace-events: drm_print: add struct _ddebug *desc to trace-drm-*() params add and use sysfs b
Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
On 2022/1/7 上午3:13, Andrey Grodzovsky wrote: > > On 2022-01-06 12:18 a.m., JingWen Chen wrote: >> On 2022/1/6 下午12:59, JingWen Chen wrote: >>> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: On 2022-01-05 2:59 a.m., Christian König wrote: > Am 05.01.22 um 08:34 schrieb JingWen Chen: >> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: >>> On 2022-01-04 6:36 a.m., Christian König wrote: Am 04.01.22 um 11:49 schrieb Liu, Monk: > [AMD Official Use Only] > >>> See the FLR request from the hypervisor is just another source of >>> signaling the need for a reset, similar to each job timeout on each >>> queue. Otherwise you have a race condition between the hypervisor >>> and the scheduler. > No it's not, FLR from hypervisor is just to notify guest the hw VF > FLR is about to start or was already executed, but host will do FLR > anyway without waiting for guest too long > Then we have a major design issue in the SRIOV protocol and really need to question this. How do you want to prevent a race between the hypervisor resetting the hardware and the client trying the same because of a timeout? As far as I can see the procedure should be: 1. We detect that a reset is necessary, either because of a fault a timeout or signal from hypervisor. 2. For each of those potential reset sources a work item is send to the single workqueue. 3. One of those work items execute first and prepares the reset. 4. We either do the reset our self or notify the hypervisor that we are ready for the reset. 5. Cleanup after the reset, eventually resubmit jobs etc.. 6. Cancel work items which might have been scheduled from other reset sources. It does make sense that the hypervisor resets the hardware without waiting for the clients for too long, but if we don't follow this general steps we will always have a race between the different components. >>> Monk, just to add to this - if indeed as you say that 'FLR from >>> hypervisor is just to notify guest the hw VF FLR is about to start or >>> was already executed, but host will do FLR anyway without waiting for >>> guest too long' >>> and there is no strict waiting from the hypervisor for >>> IDH_READY_TO_RESET to be recived from guest before starting the reset >>> then setting in_gpu_reset and locking reset_sem from guest side is not >>> really full proof >>> protection from MMIO accesses by the guest - it only truly helps if >>> hypervisor waits for that message before initiation of HW reset. >>> >> Hi Andrey, this cannot be done. If somehow guest kernel hangs and never >> has the chance to send the response back, then other VFs will have to >> wait it reset. All the vfs will hang in this case. Or sometimes the >> mailbox has some delay and other VFs will also wait. The user of other >> VFs will be affected in this case. > Yeah, agree completely with JingWen. The hypervisor is the one in charge > here, not the guest. > > What the hypervisor should do (and it already seems to be designed that > way) is to send the guest a message that a reset is about to happen and > give it some time to response appropriately. > > The guest on the other hand then tells the hypervisor that all processing > has stopped and it is ready to restart. If that doesn't happen in time > the hypervisor should eliminate the guest probably trigger even more > severe consequences, e.g. restart the whole VM etc... > > Christian. So what's the end conclusion here regarding dropping this particular patch ? Seems to me we still need to drop it to prevent driver's MMIO access to the GPU during reset from various places in the code. Andrey >>> Hi Andrey & Christian, >>> >>> I have ported your patch(drop the reset_sem and in_gpu_reset in flr work) >>> and run some tests. If a engine hang during an OCL benchmark(using kfd), we >>> can see the logs below: > > > Did you port the entire patchset or just 'drm/amd/virt: Drop concurrent GPU > reset protection for SRIOV' ? > > I ported the entire patchset >>> >>> [ 397.190727] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. >>> [ 397.301496] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. >>> [ 397.406601] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. >>> [ 397.532343] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. >>> [ 397.642251] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. >>> [ 397.746634] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. >>> [ 397.850761] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. >>> [ 397.960544] amdgpu
RE: [PATCH 2/7] drm/amd/pm: drop unneeded vcn/jpeg_gate_lock
[Public] > -Original Message- > From: Chen, Guchun > Sent: Thursday, January 6, 2022 9:22 PM > To: Quan, Evan ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Quan, Evan > > Subject: RE: [PATCH 2/7] drm/amd/pm: drop unneeded vcn/jpeg_gate_lock > > [Public] > > err0_out: > - mutex_unlock(&power_gate->jpeg_gate_lock); > - mutex_unlock(&power_gate->vcn_gate_lock); > - > return ret; > > Is it better to refactor the code to drop error path of "err0_out"? [Quan, Evan] Sure, I can do that. BR Evan > > Regards, > Guchun > > -Original Message- > From: amd-gfx On Behalf Of Evan > Quan > Sent: Thursday, January 6, 2022 1:57 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Quan, Evan > > Subject: [PATCH 2/7] drm/amd/pm: drop unneeded vcn/jpeg_gate_lock > > As those related APIs are already protected by adev->pm.mutex. > > Signed-off-by: Evan Quan > Change-Id: I762fab96bb1c034c153b029f939ec6e498460007 > --- > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 56 +++ > drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 - > 2 files changed, 8 insertions(+), 50 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > index ae91ef2078bf..ecbc768dfe2f 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > @@ -158,8 +158,8 @@ static u32 smu_get_sclk(void *handle, bool low) > return clk_freq * 100; > } > > -static int smu_dpm_set_vcn_enable_locked(struct smu_context *smu, > - bool enable) > +static int smu_dpm_set_vcn_enable(struct smu_context *smu, > + bool enable) > { > struct smu_power_context *smu_power = &smu->smu_power; > struct smu_power_gate *power_gate = &smu_power->power_gate; > @@ -178,24 +178,8 @@ static int smu_dpm_set_vcn_enable_locked(struct > smu_context *smu, > return ret; > } > > -static int smu_dpm_set_vcn_enable(struct smu_context *smu, > - bool enable) > -{ > - struct smu_power_context *smu_power = &smu->smu_power; > - struct smu_power_gate *power_gate = &smu_power->power_gate; > - int ret = 0; > - > - mutex_lock(&power_gate->vcn_gate_lock); > - > - ret = smu_dpm_set_vcn_enable_locked(smu, enable); > - > - mutex_unlock(&power_gate->vcn_gate_lock); > - > - return ret; > -} > - > -static int smu_dpm_set_jpeg_enable_locked(struct smu_context *smu, > - bool enable) > +static int smu_dpm_set_jpeg_enable(struct smu_context *smu, > +bool enable) > { > struct smu_power_context *smu_power = &smu->smu_power; > struct smu_power_gate *power_gate = &smu_power->power_gate; > @@ -214,22 +198,6 @@ static int smu_dpm_set_jpeg_enable_locked(struct > smu_context *smu, > return ret; > } > > -static int smu_dpm_set_jpeg_enable(struct smu_context *smu, > -bool enable) > -{ > - struct smu_power_context *smu_power = &smu->smu_power; > - struct smu_power_gate *power_gate = &smu_power->power_gate; > - int ret = 0; > - > - mutex_lock(&power_gate->jpeg_gate_lock); > - > - ret = smu_dpm_set_jpeg_enable_locked(smu, enable); > - > - mutex_unlock(&power_gate->jpeg_gate_lock); > - > - return ret; > -} > - > /** > * smu_dpm_set_power_gate - power gate/ungate the specific IP block > * > @@ -619,17 +587,14 @@ static int smu_set_default_dpm_table(struct > smu_context *smu) > if (!smu->ppt_funcs->set_default_dpm_table) > return 0; > > - mutex_lock(&power_gate->vcn_gate_lock); > - mutex_lock(&power_gate->jpeg_gate_lock); > - > vcn_gate = atomic_read(&power_gate->vcn_gated); > jpeg_gate = atomic_read(&power_gate->jpeg_gated); > > - ret = smu_dpm_set_vcn_enable_locked(smu, true); > + ret = smu_dpm_set_vcn_enable(smu, true); > if (ret) > goto err0_out; > > - ret = smu_dpm_set_jpeg_enable_locked(smu, true); > + ret = smu_dpm_set_jpeg_enable(smu, true); > if (ret) > goto err1_out; > > @@ -638,13 +603,10 @@ static int smu_set_default_dpm_table(struct > smu_context *smu) > dev_err(smu->adev->dev, > "Failed to setup default dpm clock tables!\n"); > > - smu_dpm_set_jpeg_enable_locked(smu, !jpeg_gate); > + smu_dpm_set_jpeg_enable(smu, !jpeg_gate); > err1_out: > - smu_dpm_set_vcn_enable_locked(smu, !vcn_gate); > + smu_dpm_set_vcn_enable(smu, !vcn_gate); > err0_out: > - mutex_unlock(&power_gate->jpeg_gate_lock); > - mutex_unlock(&power_gate->vcn_gate_lock); > - > return ret; > } > > @@ -1006,8 +968,6 @@ static int smu_sw_init(void *handle) > > atomic_set(&smu->smu_power.power_gate.vcn_gated, 1); > atomic_set(&smu->smu_power.power_gate.jpeg_gated, 1); > - mutex_ini
RE: [PATCH 3/7] drm/amd/pm: drop unneeded smu->metrics_lock
[AMD Official Use Only] > -Original Message- > From: Lazar, Lijo > Sent: Thursday, January 6, 2022 4:29 PM > To: Quan, Evan ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: Re: [PATCH 3/7] drm/amd/pm: drop unneeded smu->metrics_lock > > > > On 1/6/2022 12:32 PM, Quan, Evan wrote: > > [AMD Official Use Only] > > > > > > > >> -Original Message- > >> From: Lazar, Lijo > >> Sent: Thursday, January 6, 2022 2:17 PM > >> To: Quan, Evan ; amd-gfx@lists.freedesktop.org > >> Cc: Deucher, Alexander > >> Subject: Re: [PATCH 3/7] drm/amd/pm: drop unneeded smu- > >metrics_lock > >> > >> > >> > >> On 1/6/2022 11:27 AM, Evan Quan wrote: > >>> As all those related APIs are already well protected by > >>> adev->pm.mutex and smu->message_lock. > >>> > >> > >> This one may be widely used. Instead of relying on pm.mutex it's > >> better to keep metrics lock so that multiple clients can read data > >> without waiting on other APIs that use pm.mutex. > > [Quan, Evan] If I understand it correctly, what you wanted to express is to > use fine-grained lock instead of cross-grained one to avoid chasing for the > same lock. > > Yes, that was what we did before and that's why we have so many types of > locks. Below are my considerations for this: > > 1. We actually do not have such issue that many APIs/clients chasing for the > same lock. Thus fine-grained locks cannot bring much benefits. > > Take the metrics_lock here for example. The data protected by > metrics_lock are for those pm sysfs interfaces. Those sysfs interface are not > so frequently called. And almost all the time, they are called one after one. > So, it's rarely they will chase for the same lock. > > > > It's not just sysfs, there are other interfaces like sensors, hwmons etc. > Basically, metrics table provides data like GFX activity or throttler status > that > may be continuously monitored by app layer. So other APIs could suffer. My > thought is to just keep metrics under a separate lock and not tie with > pm.mutex. [Quan, Evan] I doubt about the guess that "other APIs could suffer if metrics data is continuously polled by app layer". Since according to my previous test(watch -n 0.1 "cat /sys/kernel/debug/dri/0/amdgpu_pm_info") which polled the metrics data every 0.1 second, the smu busy percent was almost not affected. So, even the metric data is polled every second by app layer, that should not enforce much loading on the CPUs and SMU cores. Also, keeping a separate lock for metrics data does not help much with the issue mentioned here(even if there is such). As they(app layer and other APIs) will still chase for another lock(message_lock which is kept) for interaction with SMU. BR Evan > > Thanks, > Lijo > > > 2. Cross-grained lock can simplify our implementations. It's hard to > > believe, > there is 10+(actually 13 as I counted) different types of locks used in our > existing power code. > > By the cross-grained lock, we can simplify the code and protect us from > some unintentional dead-locks(I actually run into that several times and it's > really tricky). > > > > BR > > Evan > >> > >> Thanks, > >> Lijo > >> > >>> Signed-off-by: Evan Quan > >>> Change-Id: Ic75326ba7b4b67be8762d5407d02f6c514e1ad35 > >>> --- > >>>drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 1 - > >>>drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 1 - > >>>.../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 14 +-- > >>>.../amd/pm/swsmu/smu11/cyan_skillfish_ppt.c | 10 +- > >>>.../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 112 + > > >> - > >>>.../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 27 ++--- > >>>.../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 28 ++--- > >>>.../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c | 14 +-- > >>>.../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 23 ++-- > >>>.../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c | 10 +- > >>>drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 21 +--- > >>>drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h| 4 - > >>>12 files changed, 70 insertions(+), 195 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > >>> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > >>> index ecbc768dfe2f..f0136bf36533 100644 > >>> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c > >>> @@ -956,7 +956,6 @@ static int smu_sw_init(void *handle) > >>> bitmap_zero(smu->smu_feature.allowed, SMU_FEATURE_MAX); > >>> > >>> mutex_init(&smu->sensor_lock); > >>> - mutex_init(&smu->metrics_lock); > >>> mutex_init(&smu->message_lock); > >>> > >>> INIT_WORK(&smu->throttling_logging_work, > >>> smu_throttling_logging_work_fn); diff --git > >>> a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h > >>> b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h > >>> index c3efe4fea5e0..63ed807c96f5 100644 > >>> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/am
RE: [PATCH v2] drm/amdgpu: Enable second VCN for certain Navy Flounder.
[Public] Reviewed-by: Guchun Chen Next I will figure out which navy flounder boards have the bad harvest table info from VBIOS, and add a quirk to enable this workaround. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Deucher, Alexander Sent: Friday, January 7, 2022 1:21 AM To: Zhou, Peng Ju ; amd-gfx@lists.freedesktop.org Subject: RE: [PATCH v2] drm/amdgpu: Enable second VCN for certain Navy Flounder. [Public] > -Original Message- > From: amd-gfx On Behalf Of > Peng Ju Zhou > Sent: Thursday, January 6, 2022 2:30 AM > To: amd-gfx@lists.freedesktop.org > Subject: [PATCH v2] drm/amdgpu: Enable second VCN for certain Navy > Flounder. > > Certain Navy Flounder cards have 2 VCNs, enable it. > > Signed-off-by: Peng Ju Zhou > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > index 580a5b387122..57e001d73ec9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > @@ -550,7 +550,8 @@ void amdgpu_discovery_harvest_ip(struct > amdgpu_device *adev) > } > /* some IP discovery tables on Navy Flounder don't have this set > correctly */ > if ((adev->ip_versions[UVD_HWIP][1] == IP_VERSION(3, 0, 1)) && > - (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2))) > + (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2)) && > + (adev->pdev->revision != 0xFF)) We added this check because some navy flounder boards did not correctly set the harvesting in the IP discovery table. It would be nice to sort that out so we only enable this workaround on the boards with the bad harvest table. I suppose that would probably come down to the PCI revision anyway, so this is probably fine. Acked-by: Alex Deucher > adev->vcn.harvest_config |= > AMDGPU_VCN_HARVEST_VCN1; > if (vcn_harvest_count == adev->vcn.num_vcn_inst) { > adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK; > -- > 2.33.1
Re: [PATCH] drm/amd/display: explicitly set is_dsc_supported to false before use
Acked-by: Alex Deucher On Thu, Jan 6, 2022 at 8:40 AM Mario Limonciello wrote: > > When UBSAN is enabled a case is shown on unplugging the display that > this variable hasn't been initialized by `update_dsc_caps`, presumably > when the display was unplugged it wasn't copied from the DPCD. > > Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956497 > Signed-off-by: Mario Limonciello > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 5d20807b6f88..3d81314e6cb4 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -6093,6 +6093,7 @@ static void update_dsc_caps(struct amdgpu_dm_connector > *aconnector, > struct > dsc_dec_dpcd_caps *dsc_caps) > { > stream->timing.flags.DSC = 0; > + dsc_caps->is_dsc_supported = false; > > if (aconnector->dc_link && (sink->sink_signal == > SIGNAL_TYPE_DISPLAY_PORT || > sink->sink_signal == SIGNAL_TYPE_EDP)) { > -- > 2.25.1 >
Re: Various problems trying to vga-passthrough a Renoir iGPU to a xen/qubes-os hvm
On Thu, Jan 6, 2022 at 10:38 AM Yann Dirson wrote: > > Alex wrote: > > > How is the stolen memory communicated to the driver ? That host > > > physical > > > memory probably has to be mapped at the same guest physical address > > > for > > > the magic to work, right ? > > > > Correct. The driver reads the physical location of that memory from > > hardware registers. Removing this chunk of code from gmc_v9_0.c will > > force the driver to use the BAR, > > That would only be a workaround for a missing mapping of stolen > memory to the guest, right ? Correct. That will use the PCI BAR rather than the underlying physical memory for CPU access to the carve out region. > > > > but I'm not sure if there are any > > other places in the driver that make assumptions about using the > > physical host address or not on APUs off hand. > > gmc_v9_0_vram_gtt_location() updates vm_manager.vram_base_offset from > the same value. I'm not sure I understand why in this case there is > no reason to use the BAR while there are some in gmc_v9_0_mc_init(). > > vram_base_offset then gets used in several places: > > * amdgpu_gmc_init_pdb0, that seems likely enough to be problematic, > right ? > As a sidenote the XGMI offset added earlier gets substracted > here to deduce vram base addr > (a couple of new acronyms there: PDB, PDE -- page directory base/entry?) > > * amdgpu_ttm_map_buffer, amdgpu_vm_bo_update_mapping: those seem to be > as problematic > > * amdgpu_gmc_vram_mc2pa: until I got there I had assumed MC could stand for > "memory controller", but then "MC address of buffer" makes me doubt > > MC = memory controller (as in graphics memory controller). These are GPU addresses not CPU addresses so they should be fine. > > > > if ((adev->flags & AMD_IS_APU) || > > (adev->gmc.xgmi.supported && > > adev->gmc.xgmi.connected_to_cpu)) { > > adev->gmc.aper_base = > > adev->gfxhub.funcs->get_mc_fb_offset(adev) + > > adev->gmc.xgmi.physical_node_id * > > adev->gmc.xgmi.node_segment_size; > > adev->gmc.aper_size = adev->gmc.real_vram_size; > > } > > > Now for the test... it does indeed seem to go much further, I even > loose the dom0's efifb to that black screen hopefully showing the > driver started to setup the hardware. Will probably still have to > hunt down whether it still tries to use efifb afterwards (can't see > why it would not, TBH, given the previous behaviour where it kept > using it after the guest failed to start). > > The log shows many details about TMR loading > > Then as expected: > > [2022-01-06 15:16:09] <6>[5.844589] amdgpu :00:05.0: amdgpu: RAP: > optional rap ta ucode is not available > [2022-01-06 15:16:09] <6>[5.844619] amdgpu :00:05.0: amdgpu: > SECUREDISPLAY: securedisplay ta ucode is not available > [2022-01-06 15:16:09] <7>[5.844639] [drm:amdgpu_device_init.cold > [amdgpu]] hw_init (phase2) of IP block ... > [2022-01-06 15:16:09] <6>[5.845515] amdgpu :00:05.0: amdgpu: SMU is > initialized successfully! > > > not sure about that unhandled interrupt (and a bit worried about messed-up > logs): > > [2022-01-06 15:16:09] <7>[6.010681] amdgpu :00:05.0: > [drm:amdgpu_ring_test_hel[2022-01-06 15:16:10] per [amdgpu]] ring test on > sdma0 succeeded > [2022-01-06 15:16:10] <7>[6.010831] [drm:amdgpu_ih_process [amdgpu]] > amdgpu_ih_process: rptr 0, wptr 32 > [2022-01-06 15:16:10] <7>[6.011002] [drm:amdgpu_irq_dispatch [amdgpu]] > Unhandled interrupt src_id: 243 > > > then comes a first error: > > [2022-01-06 15:16:10] <6>[6.011785] [drm] Display Core initialized with > v3.2.149! > [2022-01-06 15:16:10] <6>[6.012714] [drm] DMUB hardware initialized: > version=0x0101001C > [2022-01-06 15:16:10] <3>[6.228263] [drm:dc_dmub_srv_wait_idle [amdgpu]] > *ERROR* Error waiting for DMUB idle: status=3 > [2022-01-06 15:16:10] <7>[6.229125] [drm:amdgpu_dm_init.isra.0.cold > [amdgpu]] amdgpu: freesync_module init done 76c7b459. > [2022-01-06 15:16:10] <7>[6.229677] [drm:amdgpu_dm_init.isra.0.cold > [amdgpu]] amdgpu: hdcp_workqueue init done 87e28b47. > [2022-01-06 15:16:10] <7>[6.229979] [drm:amdgpu_dm_init.isra.0.cold > [amdgpu]] amdgpu_dm_connector_init() > > ... which we can see again several times later though the driver seems > sufficient to finish init: > > [2022-01-06 15:16:10] <6>[6.615615] [drm] late_init of IP block ... > [2022-01-06 15:16:10] <6>[6.615772] [drm] late_init of IP block > ... > [2022-01-06 15:16:10] <6>[6.615801] [drm] late_init of IP block > ... > [2022-01-06 15:16:10] <6>[6.615827] [drm] late_init of IP block ... > [2022-01-06 15:16:10] <3>[6.801790] [drm:dc_dmub_srv_wait_idle [amdgpu]] > *ERROR* Error waiting for DMUB idle: status=3 > [2022-01-06 15:16:10] <7>[6.806079] [drm:drm_minor_register [drm]] > [2022-01-06 1
Re: [PATCH v6 2/6] drm: improve drm_buddy_alloc function
> -Original Message- > From: amd-gfx On Behalf Of Matthew > Auld > Sent: Tuesday, January 4, 2022 7:32 PM > To: Paneer Selvam, Arunpravin ; > dri-de...@lists.freedesktop.org; intel-...@lists.freedesktop.org; > amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; tzimmerm...@suse.de; > jani.nik...@linux.intel.com; Koenig, Christian ; > dan...@ffwll.ch > Subject: Re: [PATCH v6 2/6] drm: improve drm_buddy_alloc function > > On 26/12/2021 22:24, Arunpravin wrote: >> - Make drm_buddy_alloc a single function to handle >>range allocation and non-range allocation demands >> >> - Implemented a new function alloc_range() which allocates >>the requested power-of-two block comply with range limitations >> >> - Moved order computation and memory alignment logic from >>i915 driver to drm buddy >> >> v2: >>merged below changes to keep the build unbroken >> - drm_buddy_alloc_range() becomes obsolete and may be removed >> - enable ttm range allocation (fpfn / lpfn) support in i915 driver >> - apply enhanced drm_buddy_alloc() function to i915 driver >> >> v3(Matthew Auld): >>- Fix alignment issues and remove unnecessary list_empty check >>- add more validation checks for input arguments >>- make alloc_range() block allocations as bottom-up >>- optimize order computation logic >>- replace uint64_t with u64, which is preferred in the kernel >> >> v4(Matthew Auld): >>- keep drm_buddy_alloc_range() function implementation for generic >> actual range allocations >>- keep alloc_range() implementation for end bias allocations >> >> Signed-off-by: Arunpravin > > > >> @@ -73,34 +83,16 @@ static int i915_ttm_buddy_man_alloc(struct >> ttm_resource_manager *man, >> >> n_pages = size >> ilog2(mm->chunk_size); >> >> -do { >> -struct drm_buddy_block *block; >> -unsigned int order; >> - >> -order = fls(n_pages) - 1; >> -GEM_BUG_ON(order > mm->max_order); >> -GEM_BUG_ON(order < min_order); >> - >> -do { >> -mutex_lock(&bman->lock); >> -block = drm_buddy_alloc(mm, order); >> -mutex_unlock(&bman->lock); >> -if (!IS_ERR(block)) >> -break; >> - >> -if (order-- == min_order) { >> -err = -ENOSPC; >> -goto err_free_blocks; >> -} >> -} while (1); >> - >> -n_pages -= BIT(order); >> - >> -list_add_tail(&block->link, &bman_res->blocks); >> - >> -if (!n_pages) >> -break; >> -} while (1); >> +mutex_lock(&bman->lock); >> +err = drm_buddy_alloc(mm, (u64)place->fpfn << PAGE_SHIFT, >> +(u64)place->lpfn << PAGE_SHIFT, > > place->lpfn will currently always be zero for i915, AFAIK. I assume here > we want s/place->lpfn/lpfn/? I replaced place->lpfn with lpfn as below >> +err = drm_buddy_alloc(mm, (u64)place->fpfn << PAGE_SHIFT, >> +(u64)lpfn << PAGE_SHIFT, AFAIK, we need to change only at above location, hope this fixes the firmware allocation issue. > > Also something in this series is preventing i915 from loading on > discrete devices, according to CI. Hopefully that is just the lpfn > issue...which might explain seeing -EINVAL here[1] when allocating some > vram for the firmware. > > [1] > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel-gfx-ci.01.org%2Ftree%2Fdrm-tip%2FPatchwork_21904%2Fbat-dg1-6%2Fboot0.txt&data=04%7C01%7Carunpravin.paneerselvam%40amd.com%7C4e7c1345d6a649a1682608d9cf8ae82e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637769017786517465%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FsSHcnvbMhPhWaZ3tVLAswf04p5tBsbSYqKdYzpib88%3D&reserved=0 > > >> +(u64)n_pages << PAGE_SHIFT, >> + min_page_size, >> + &bman_res->blocks, >> + bman_res->flags); >> +mutex_unlock(&bman->lock); >> +if (unlikely(err)) >> +goto err_free_blocks; >> >> *res = &bman_res->base; >> return 0; >> @@ -266,10 +258,17 @@ int i915_ttm_buddy_man_reserve(struct >> ttm_resource_manager *man, >> { >> struct i915_ttm_buddy_manager *bman = to_buddy_manager(man); >> struct drm_buddy_mm *mm = &bman->mm; >> +unsigned long flags = 0; >> int ret; >> >> +flags |= DRM_BUDDY_RANGE_ALLOCATION; >> + >> mutex_lock(&bman->lock); >> -ret = drm_buddy_alloc_range(mm, &bman->reserved, start, size); >> +ret = drm_buddy_alloc(mm, start, >> +start + size, >> +size, mm->chunk_size, >> +&bman->reserved, >> +flags); >> mutex_unlock(&bman->lock); >> >> return ret; >> diff --git
Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
On 2022-01-06 12:18 a.m., JingWen Chen wrote: On 2022/1/6 下午12:59, JingWen Chen wrote: On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: On 2022-01-05 2:59 a.m., Christian König wrote: Am 05.01.22 um 08:34 schrieb JingWen Chen: On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: On 2022-01-04 6:36 a.m., Christian König wrote: Am 04.01.22 um 11:49 schrieb Liu, Monk: [AMD Official Use Only] See the FLR request from the hypervisor is just another source of signaling the need for a reset, similar to each job timeout on each queue. Otherwise you have a race condition between the hypervisor and the scheduler. No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is about to start or was already executed, but host will do FLR anyway without waiting for guest too long Then we have a major design issue in the SRIOV protocol and really need to question this. How do you want to prevent a race between the hypervisor resetting the hardware and the client trying the same because of a timeout? As far as I can see the procedure should be: 1. We detect that a reset is necessary, either because of a fault a timeout or signal from hypervisor. 2. For each of those potential reset sources a work item is send to the single workqueue. 3. One of those work items execute first and prepares the reset. 4. We either do the reset our self or notify the hypervisor that we are ready for the reset. 5. Cleanup after the reset, eventually resubmit jobs etc.. 6. Cancel work items which might have been scheduled from other reset sources. It does make sense that the hypervisor resets the hardware without waiting for the clients for too long, but if we don't follow this general steps we will always have a race between the different components. Monk, just to add to this - if indeed as you say that 'FLR from hypervisor is just to notify guest the hw VF FLR is about to start or was already executed, but host will do FLR anyway without waiting for guest too long' and there is no strict waiting from the hypervisor for IDH_READY_TO_RESET to be recived from guest before starting the reset then setting in_gpu_reset and locking reset_sem from guest side is not really full proof protection from MMIO accesses by the guest - it only truly helps if hypervisor waits for that message before initiation of HW reset. Hi Andrey, this cannot be done. If somehow guest kernel hangs and never has the chance to send the response back, then other VFs will have to wait it reset. All the vfs will hang in this case. Or sometimes the mailbox has some delay and other VFs will also wait. The user of other VFs will be affected in this case. Yeah, agree completely with JingWen. The hypervisor is the one in charge here, not the guest. What the hypervisor should do (and it already seems to be designed that way) is to send the guest a message that a reset is about to happen and give it some time to response appropriately. The guest on the other hand then tells the hypervisor that all processing has stopped and it is ready to restart. If that doesn't happen in time the hypervisor should eliminate the guest probably trigger even more severe consequences, e.g. restart the whole VM etc... Christian. So what's the end conclusion here regarding dropping this particular patch ? Seems to me we still need to drop it to prevent driver's MMIO access to the GPU during reset from various places in the code. Andrey Hi Andrey & Christian, I have ported your patch(drop the reset_sem and in_gpu_reset in flr work) and run some tests. If a engine hang during an OCL benchmark(using kfd), we can see the logs below: Did you port the entire patchset or just 'drm/amd/virt: Drop concurrent GPU reset protection for SRIOV' ? [ 397.190727] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.301496] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.406601] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.532343] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.642251] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.746634] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.850761] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.960544] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.065218] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.182173] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.288264] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.394712] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 428.400582] [drm] clean up the vf2pf work item [ 428.500528] amdgpu :00:07.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32771, for process xgemmStandalone pid 3557 thread xgemmStandalone pid 3557) [ 428.527576] amdgpu :00:07.0: amdgpu: in page starting at address 0x7fc991c04000 from client 0x1b (UTCL2) [ 437.5313
Re: [PATCH v2] drm/amdgpu: Unmap MMIO mappings when device is not unplugged
Got it See bellow one small comment, with that the patch is Reviewed-by: Andrey Grodzovsky On 2022-01-05 9:24 p.m., Shi, Leslie wrote: [AMD Official Use Only] Hi Andrey, It is the following patch calls to amdgpu_device_unmap_mmio() conditioned on device unplugged. 3efb17ae7e92 "drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure" Regards, Leslie -Original Message- From: Grodzovsky, Andrey Sent: Thursday, January 6, 2022 2:22 AM To: Shi, Leslie ; Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Chen, Guchun Subject: Re: [PATCH v2] drm/amdgpu: Unmap MMIO mappings when device is not unplugged On 2022-01-04 11:23 p.m., Leslie Shi wrote: Patch: 3efb17ae7e92 ("drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure") makes call to amdgpu_device_unmap_mmio() conditioned on device unplugged. This patch unmaps MMIO mappings even when device is not unplugged. drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure I don't see the 'call to amdgpu_device_unmap_mmio() conditioned on device unplugged' part in this patch Also, please add 'v2:bla bla bla' part in patch description telling what was done in v2 Andrey Signed-off-by: Leslie Shi --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +++ 3 files changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 412f377f80b1..16dc16c860cc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3832,6 +3832,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev) { + Drop the new line Andrey /* Clear all CPU mappings pointing to this device */ unmap_mapping_range(adev->ddev.anon_inode->i_mapping, 0, 0, 1); @@ -3912,6 +3913,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) void amdgpu_device_fini_sw(struct amdgpu_device *adev) { + int idx; + amdgpu_fence_driver_sw_fini(adev); amdgpu_device_ip_fini(adev); release_firmware(adev->firmware.gpu_info_fw); @@ -3936,6 +3939,14 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev) if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA) vga_client_register(adev->pdev, NULL, NULL, NULL); + if (drm_dev_enter(adev_to_drm(adev), &idx)) { + + iounmap(adev->rmmio); + adev->rmmio = NULL; + amdgpu_device_doorbell_fini(adev); + drm_dev_exit(idx); + } + if (IS_ENABLED(CONFIG_PERF_EVENTS)) amdgpu_pmu_fini(adev); if (adev->mman.discovery_bin) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 156002db24e1..ff9dc377a3a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -33,6 +33,7 @@ #include #include +#include #include #include #include "amdgpu.h" @@ -1061,7 +1062,18 @@ int amdgpu_bo_init(struct amdgpu_device *adev) */ void amdgpu_bo_fini(struct amdgpu_device *adev) { + int idx; + amdgpu_ttm_fini(adev); + + if (drm_dev_enter(adev_to_drm(adev), &idx)) { + + if (!adev->gmc.xgmi.connected_to_cpu) { + arch_phys_wc_del(adev->gmc.vram_mtrr); + arch_io_free_memtype_wc(adev->gmc.aper_base, adev->gmc.aper_size); + } + drm_dev_exit(idx); + } } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 367abed1d6e6..ea897feeddd2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -42,6 +42,7 @@ #include #include +#include #include #include #include @@ -1801,6 +1802,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) */ void amdgpu_ttm_fini(struct amdgpu_device *adev) { + int idx; if (!adev->mman.initialized) return; @@ -1815,6 +1817,15 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev) NULL, NULL); amdgpu_ttm_fw_reserve_vram_fini(adev); + if (drm_dev_enter(adev_to_drm(adev), &idx)) { + + if (adev->mman.aper_base_kaddr) + iounmap(adev->mman.aper_base_kaddr); + adev->mman.aper_base_kaddr = NULL; + + drm_dev_exit(idx); + } + amdgpu_vram_mgr_fini(adev); amdgpu_gtt_mgr_fini(adev); amdgpu_preempt_mgr_fini(adev);
Re: [PATCH v6 4/6] drm: implement a method to free unused pages
On 04/01/22 7:41 pm, Matthew Auld wrote: > On 26/12/2021 22:24, Arunpravin wrote: >> On contiguous allocation, we round up the size >> to the *next* power of 2, implement a function >> to free the unused pages after the newly allocate block. >> >> v2(Matthew Auld): >>- replace function name 'drm_buddy_free_unused_pages' with >> drm_buddy_block_trim >>- replace input argument name 'actual_size' with 'new_size' >>- add more validation checks for input arguments >>- add overlaps check to avoid needless searching and splitting >>- merged the below patch to see the feature in action >> - add free unused pages support to i915 driver >>- lock drm_buddy_block_trim() function as it calls mark_free/mark_split >> are all globally visible >> >> v3(Matthew Auld): >>- remove trim method error handling as we address the failure case >> at drm_buddy_block_trim() function >> >> v4: >>- in case of trim, at __alloc_range() split_block failure path >> marks the block as free and removes it from the original list, >> potentially also freeing it, to overcome this problem, we turn >> the drm_buddy_block_trim() input node into a temporary node to >> prevent recursively freeing itself, but still retain the >> un-splitting/freeing of the other nodes(Matthew Auld) >> >>- modify the drm_buddy_block_trim() function return type >> >> Signed-off-by: Arunpravin >> --- >> drivers/gpu/drm/drm_buddy.c | 61 +++ >> drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 8 +++ >> include/drm/drm_buddy.h | 4 ++ >> 3 files changed, 73 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >> index eddc1eeda02e..855afcaf7edd 100644 >> --- a/drivers/gpu/drm/drm_buddy.c >> +++ b/drivers/gpu/drm/drm_buddy.c >> @@ -538,6 +538,67 @@ static int __drm_buddy_alloc_range(struct drm_buddy_mm >> *mm, >> return __alloc_range(mm, &dfs, start, size, blocks); >> } >> >> +/** >> + * drm_buddy_block_trim - free unused pages >> + * >> + * @mm: DRM buddy manager >> + * @new_size: original size requested >> + * @blocks: output list head to add allocated blocks >> + * >> + * For contiguous allocation, we round up the size to the nearest >> + * power of two value, drivers consume *actual* size, so remaining >> + * portions are unused and it can be freed. >> + */ >> +void drm_buddy_block_trim(struct drm_buddy_mm *mm, >> + u64 new_size, >> + struct list_head *blocks) > > It might be better to just return the error, and let the user decide if > they want to ignore it? Also we might want some kind of unit test for > this, so having an actual return value might be useful there. > sure will revert it >> +{ >> +struct drm_buddy_block *parent; >> +struct drm_buddy_block *block; >> +LIST_HEAD(dfs); >> +u64 new_start; >> +int err; >> + >> +if (!list_is_singular(blocks)) >> +return; >> + >> +block = list_first_entry(blocks, >> + struct drm_buddy_block, >> + link); >> + >> +if (!drm_buddy_block_is_allocated(block)) >> +return; >> + >> +if (new_size > drm_buddy_block_size(mm, block)) >> +return; >> + >> +if (!new_size && !IS_ALIGNED(new_size, mm->chunk_size)) >> +return; >> + >> +if (new_size == drm_buddy_block_size(mm, block)) >> +return; >> + >> +list_del(&block->link); >> +mark_free(mm, block); >> +mm->avail += drm_buddy_block_size(mm, block); >> + >> +/* Prevent recursively freeing this node */ >> +parent = block->parent; >> +block->parent = NULL; >> + >> +new_start = drm_buddy_block_offset(block); >> +list_add(&block->tmp_link, &dfs); >> +err = __alloc_range(mm, &dfs, new_start, new_size, blocks); >> +if (err) { >> +mark_allocated(block); >> +mm->avail -= drm_buddy_block_size(mm, block); >> +list_add(&block->link, blocks); >> +} >> + >> +block->parent = parent; >> +} >> +EXPORT_SYMBOL(drm_buddy_block_trim); >> + >> /** >>* drm_buddy_alloc - allocate power-of-two blocks >>* >> diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c >> b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c >> index 7c58efb60dba..05f924f32e96 100644 >> --- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c >> +++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c >> @@ -97,6 +97,14 @@ static int i915_ttm_buddy_man_alloc(struct >> ttm_resource_manager *man, >> if (unlikely(err)) >> goto err_free_blocks; >> >> +if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { >> +mutex_lock(&bman->lock); >> +drm_buddy_block_trim(mm, >> +(u64)n_pages << PAGE_SHIFT, > > AFAIK, n_pages has already been rounded up to the next power-of-two > here, so this becomes a noop.
RE: [PATCH v2] drm/amdgpu: Enable second VCN for certain Navy Flounder.
[Public] > -Original Message- > From: amd-gfx On Behalf Of > Peng Ju Zhou > Sent: Thursday, January 6, 2022 2:30 AM > To: amd-gfx@lists.freedesktop.org > Subject: [PATCH v2] drm/amdgpu: Enable second VCN for certain Navy > Flounder. > > Certain Navy Flounder cards have 2 VCNs, enable it. > > Signed-off-by: Peng Ju Zhou > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > index 580a5b387122..57e001d73ec9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c > @@ -550,7 +550,8 @@ void amdgpu_discovery_harvest_ip(struct > amdgpu_device *adev) > } > /* some IP discovery tables on Navy Flounder don't have this set > correctly */ > if ((adev->ip_versions[UVD_HWIP][1] == IP_VERSION(3, 0, 1)) && > - (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2))) > + (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2)) && > + (adev->pdev->revision != 0xFF)) We added this check because some navy flounder boards did not correctly set the harvesting in the IP discovery table. It would be nice to sort that out so we only enable this workaround on the boards with the bad harvest table. I suppose that would probably come down to the PCI revision anyway, so this is probably fine. Acked-by: Alex Deucher > adev->vcn.harvest_config |= > AMDGPU_VCN_HARVEST_VCN1; > if (vcn_harvest_count == adev->vcn.num_vcn_inst) { > adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK; > -- > 2.33.1
Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process
Am 2022-01-06 um 4:05 a.m. schrieb Christian König: > Am 05.01.22 um 17:16 schrieb Felix Kuehling: >> [SNIP] But KFD doesn't know anything about the inherited BOs from the parent process. >>> Ok, why that? When the KFD is reinitializing it's context why >>> shouldn't it cleanup those VMAs? >> That cleanup has to be initiated by user mode. Basically closing the old >> KFD and DRM file descriptors, cleaning up all the user mode VM state, >> unmapping all the VMAs, etc. Then it reopens KFD and the render nodes >> and starts from scratch. >> >> User mode will do this automatically when it tries to reinitialize ROCm. >> However, in this case the child process doesn't do that (e.g. a python >> application using the multi-processing package). The child process does >> not use ROCm. But you're left with all the dangling VMAs in the child >> process indefinitely. > > Oh, not that one again. I'm unfortunately pretty sure that this is an > clear NAK then. > > This python multi-processing package is violating various > specifications by doing this fork() and we already had multiple > discussions about that. Well, it's in wide-spread use. We can't just throw up our hands and say they're buggy and not supported. Also, why does your ACK or NAK depend on this at all. If it's the right thing to do, it's the right thing to do regardless of who benefits from it. In addition, how can a child process that doesn't even use the GPU be in violation of any GPU-driver related specifications. Regards, Felix > > Let's talk about this on Mondays call. Thanks for giving the whole > context. > > Regards, > Christian. > >> >> Regards, >> Felix >> >
Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process
Am 2022-01-06 um 11:48 a.m. schrieb Christian König: > Am 06.01.22 um 17:45 schrieb Felix Kuehling: >> Am 2022-01-06 um 4:05 a.m. schrieb Christian König: >>> Am 05.01.22 um 17:16 schrieb Felix Kuehling: [SNIP] >> But KFD doesn't know anything about the inherited BOs >> from the parent process. > Ok, why that? When the KFD is reinitializing it's context why > shouldn't it cleanup those VMAs? That cleanup has to be initiated by user mode. Basically closing the old KFD and DRM file descriptors, cleaning up all the user mode VM state, unmapping all the VMAs, etc. Then it reopens KFD and the render nodes and starts from scratch. User mode will do this automatically when it tries to reinitialize ROCm. However, in this case the child process doesn't do that (e.g. a python application using the multi-processing package). The child process does not use ROCm. But you're left with all the dangling VMAs in the child process indefinitely. >>> Oh, not that one again. I'm unfortunately pretty sure that this is an >>> clear NAK then. >>> >>> This python multi-processing package is violating various >>> specifications by doing this fork() and we already had multiple >>> discussions about that. >> Well, it's in wide-spread use. We can't just throw up our hands and say >> they're buggy and not supported. > > Because that's not my NAK, but rather from upstream. > >> Also, why does your ACK or NAK depend on this at all. If it's the right >> thing to do, it's the right thing to do regardless of who benefits from >> it. In addition, how can a child process that doesn't even use the GPU >> be in violation of any GPU-driver related specifications. > > The argument is that the application is broken and needs to be fixed > instead of worked around inside the kernel. I still don't get how they the application is broken. Like I said, the child process is not using the GPU. How can the application be fixed in this case? Are you saying, any application that forks and doesn't immediately call exec is broken? Or does an application that forks need to be aware that some other part of the application used the GPU and explicitly free any GPU resources? Thanks, Felix > > Regards, > Christian. > >> >> Regards, >> Felix >> >> >>> Let's talk about this on Mondays call. Thanks for giving the whole >>> context. >>> >>> Regards, >>> Christian. >>> Regards, Felix >
Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process
Am 06.01.22 um 17:45 schrieb Felix Kuehling: Am 2022-01-06 um 4:05 a.m. schrieb Christian König: Am 05.01.22 um 17:16 schrieb Felix Kuehling: [SNIP] But KFD doesn't know anything about the inherited BOs from the parent process. Ok, why that? When the KFD is reinitializing it's context why shouldn't it cleanup those VMAs? That cleanup has to be initiated by user mode. Basically closing the old KFD and DRM file descriptors, cleaning up all the user mode VM state, unmapping all the VMAs, etc. Then it reopens KFD and the render nodes and starts from scratch. User mode will do this automatically when it tries to reinitialize ROCm. However, in this case the child process doesn't do that (e.g. a python application using the multi-processing package). The child process does not use ROCm. But you're left with all the dangling VMAs in the child process indefinitely. Oh, not that one again. I'm unfortunately pretty sure that this is an clear NAK then. This python multi-processing package is violating various specifications by doing this fork() and we already had multiple discussions about that. Well, it's in wide-spread use. We can't just throw up our hands and say they're buggy and not supported. Because that's not my NAK, but rather from upstream. Also, why does your ACK or NAK depend on this at all. If it's the right thing to do, it's the right thing to do regardless of who benefits from it. In addition, how can a child process that doesn't even use the GPU be in violation of any GPU-driver related specifications. The argument is that the application is broken and needs to be fixed instead of worked around inside the kernel. Regards, Christian. Regards, Felix Let's talk about this on Mondays call. Thanks for giving the whole context. Regards, Christian. Regards, Felix
[PATCH v4 3/3] drm/amd/display: Use requested power state to avoid HPD WA during s0ix
The WA from commit 5965280abd30 ("drm/amd/display: Apply w/a for hard hang on HPD") causes a regression in s0ix where the system will fail to resume properly. This may be because an HPD was active the last time clocks were updated but clocks didn't get updated again during s0ix. So add an extra call to update clocks as part of the suspend routine: dm_suspend->dc_set_power_state->clk_mgr_optimize_pwr_state In case HPD is set during this time, also check if the call happened during suspend to allow overriding the WA. Cc: Qingqing Zhuo Reported-by: Scott Bruce Reported-by: Chris Hixon Reported-by: spassw...@web.de Link: https://bugzilla.kernel.org/show_bug.cgi?id=215436 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1821 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1852 Fixes: 5965280abd30 ("drm/amd/display: Apply w/a for hard hang on HPD") Fixes: 1bd3bc745e7f ("drm/amd/display: Extend w/a for hard hang on HPD to dcn20") Signed-off-by: Mario Limonciello --- changes from v3->v4: * Move into new function * Explicitly check that current_state is active for safety * Change metadata from BugLink to Link changes from v2->v3: * stop depending on adev, get value of power state from display core changes from v1->v2: * Add fallthrough statement * Extend case to check if call was explicitly in s0ix since #1852 showed hpd_state can be set at this time too * Adjust commit message and title * Add extra commit and bug fixed to metadata drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 4 drivers/gpu/drm/amd/display/dc/core/dc.c | 3 +++ 2 files changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c index 20974829f304..d2e1938555dc 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c @@ -129,6 +129,10 @@ static bool check_really_safe_to_lower(struct dc *dc, struct dc_state *context) if (display_count > 0) return false; + if (dc->current_state && + dc->current_state->power_state == DC_ACPI_CM_POWER_STATE_D3) + return true; + for (irq_src = DC_IRQ_SOURCE_HPD1; irq_src <= DC_IRQ_SOURCE_HPD5; irq_src++) { hpd_state = dc_get_hpd_state_dcn21(dc->res_pool->irqs, irq_src); if (hpd_state) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 8edbb6c70512..716e055a61c9 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -3302,6 +3302,9 @@ void dc_set_power_state( } break; + case DC_ACPI_CM_POWER_STATE_D3: + clk_mgr_optimize_pwr_state(dc, dc->clk_mgr); + fallthrough; default: ASSERT(dc->current_state->stream_count == 0); /* Zero out the current context so that on resume we start with -- 2.25.1
[PATCH v4 2/3] drm/amd/display: Split out checks for applying HPD WA into a separate function
This will make it cleaner to read through the logic for an upcoming change. Signed-off-by: Mario Limonciello --- Changes from v3->v4: * New patch .../amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 30 +++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c index fbda42313bfe..20974829f304 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c @@ -120,6 +120,23 @@ static void rn_update_clocks_update_dpp_dto(struct clk_mgr_internal *clk_mgr, } } +static bool check_really_safe_to_lower(struct dc *dc, struct dc_state *context) +{ + int display_count = rn_get_active_display_cnt_wa(dc, context); + uint32_t hpd_state; + int irq_src; + + if (display_count > 0) + return false; + + for (irq_src = DC_IRQ_SOURCE_HPD1; irq_src <= DC_IRQ_SOURCE_HPD5; irq_src++) { + hpd_state = dc_get_hpd_state_dcn21(dc->res_pool->irqs, irq_src); + if (hpd_state) + break; + } + + return !hpd_state; +} static void rn_update_clocks(struct clk_mgr *clk_mgr_base, struct dc_state *context, @@ -128,12 +145,9 @@ static void rn_update_clocks(struct clk_mgr *clk_mgr_base, struct clk_mgr_internal *clk_mgr = TO_CLK_MGR_INTERNAL(clk_mgr_base); struct dc_clocks *new_clocks = &context->bw_ctx.bw.dcn.clk; struct dc *dc = clk_mgr_base->ctx->dc; - int display_count; - int irq_src; bool update_dppclk = false; bool update_dispclk = false; bool dpp_clock_lowered = false; - uint32_t hpd_state; struct dmcu *dmcu = clk_mgr_base->ctx->dc->res_pool->dmcu; @@ -148,16 +162,8 @@ static void rn_update_clocks(struct clk_mgr *clk_mgr_base, /* check that we're not already in lower */ if (clk_mgr_base->clks.pwr_state != DCN_PWR_STATE_LOW_POWER) { - display_count = rn_get_active_display_cnt_wa(dc, context); - - for (irq_src = DC_IRQ_SOURCE_HPD1; irq_src <= DC_IRQ_SOURCE_HPD5; irq_src++) { - hpd_state = dc_get_hpd_state_dcn21(dc->res_pool->irqs, irq_src); - if (hpd_state) - break; - } - /* if we can go lower, go lower */ - if (display_count == 0 && !hpd_state) { + if (check_really_safe_to_lower(dc, context)) { rn_vbios_smu_set_dcn_low_power_state(clk_mgr, DCN_PWR_STATE_LOW_POWER); /* update power state */ clk_mgr_base->clks.pwr_state = DCN_PWR_STATE_LOW_POWER; -- 2.25.1
[PATCH v4 1/3] drm/amd/display: Add power_state member into current_state
This can be used by the display core to let decisions be made based upon the requested power state. Cc: Qingqing Zhuo Cc: Scott Bruce Cc: Chris Hixon Cc: spassw...@web.de Signed-off-by: Mario Limonciello --- changes from v3->v4: * Initialize power_state when context is created (0 shouldn't be a valid state) changes from v2->v3: * New patch drivers/gpu/drm/amd/display/dc/core/dc.c| 3 +++ drivers/gpu/drm/amd/display/dc/inc/core_types.h | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 91c4874473d6..8edbb6c70512 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1960,6 +1960,7 @@ static void init_state(struct dc *dc, struct dc_state *context) #ifdef CONFIG_DRM_AMD_DC_DCN memcpy(&context->bw_ctx.dml, &dc->dml, sizeof(struct display_mode_lib)); #endif + context->power_state = DC_ACPI_CM_POWER_STATE_D0; } struct dc_state *dc_create_state(struct dc *dc) @@ -3281,6 +3282,8 @@ void dc_set_power_state( if (!dc->current_state) return; + dc->current_state->power_state = power_state; + switch (power_state) { case DC_ACPI_CM_POWER_STATE_D0: dc_resource_state_construct(dc, dc->current_state); diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h b/drivers/gpu/drm/amd/display/dc/inc/core_types.h index 943240e2809e..6bd0aeed1856 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h +++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h @@ -506,6 +506,7 @@ struct dc_state { struct { unsigned int stutter_period_us; } perf_params; + enum dc_acpi_cm_power_state power_state; }; #endif /* _CORE_TYPES_H_ */ -- 2.25.1
Re: Various problems trying to vga-passthrough a Renoir iGPU to a xen/qubes-os hvm
Alex wrote: > > How is the stolen memory communicated to the driver ? That host > > physical > > memory probably has to be mapped at the same guest physical address > > for > > the magic to work, right ? > > Correct. The driver reads the physical location of that memory from > hardware registers. Removing this chunk of code from gmc_v9_0.c will > force the driver to use the BAR, That would only be a workaround for a missing mapping of stolen memory to the guest, right ? > but I'm not sure if there are any > other places in the driver that make assumptions about using the > physical host address or not on APUs off hand. gmc_v9_0_vram_gtt_location() updates vm_manager.vram_base_offset from the same value. I'm not sure I understand why in this case there is no reason to use the BAR while there are some in gmc_v9_0_mc_init(). vram_base_offset then gets used in several places: * amdgpu_gmc_init_pdb0, that seems likely enough to be problematic, right ? As a sidenote the XGMI offset added earlier gets substracted here to deduce vram base addr (a couple of new acronyms there: PDB, PDE -- page directory base/entry?) * amdgpu_ttm_map_buffer, amdgpu_vm_bo_update_mapping: those seem to be as problematic * amdgpu_gmc_vram_mc2pa: until I got there I had assumed MC could stand for "memory controller", but then "MC address of buffer" makes me doubt > > if ((adev->flags & AMD_IS_APU) || > (adev->gmc.xgmi.supported && > adev->gmc.xgmi.connected_to_cpu)) { > adev->gmc.aper_base = > adev->gfxhub.funcs->get_mc_fb_offset(adev) + > adev->gmc.xgmi.physical_node_id * > adev->gmc.xgmi.node_segment_size; > adev->gmc.aper_size = adev->gmc.real_vram_size; > } Now for the test... it does indeed seem to go much further, I even loose the dom0's efifb to that black screen hopefully showing the driver started to setup the hardware. Will probably still have to hunt down whether it still tries to use efifb afterwards (can't see why it would not, TBH, given the previous behaviour where it kept using it after the guest failed to start). The log shows many details about TMR loading Then as expected: [2022-01-06 15:16:09] <6>[5.844589] amdgpu :00:05.0: amdgpu: RAP: optional rap ta ucode is not available [2022-01-06 15:16:09] <6>[5.844619] amdgpu :00:05.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [2022-01-06 15:16:09] <7>[5.844639] [drm:amdgpu_device_init.cold [amdgpu]] hw_init (phase2) of IP block ... [2022-01-06 15:16:09] <6>[5.845515] amdgpu :00:05.0: amdgpu: SMU is initialized successfully! not sure about that unhandled interrupt (and a bit worried about messed-up logs): [2022-01-06 15:16:09] <7>[6.010681] amdgpu :00:05.0: [drm:amdgpu_ring_test_hel[2022-01-06 15:16:10] per [amdgpu]] ring test on sdma0 succeeded [2022-01-06 15:16:10] <7>[6.010831] [drm:amdgpu_ih_process [amdgpu]] amdgpu_ih_process: rptr 0, wptr 32 [2022-01-06 15:16:10] <7>[6.011002] [drm:amdgpu_irq_dispatch [amdgpu]] Unhandled interrupt src_id: 243 then comes a first error: [2022-01-06 15:16:10] <6>[6.011785] [drm] Display Core initialized with v3.2.149! [2022-01-06 15:16:10] <6>[6.012714] [drm] DMUB hardware initialized: version=0x0101001C [2022-01-06 15:16:10] <3>[6.228263] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3 [2022-01-06 15:16:10] <7>[6.229125] [drm:amdgpu_dm_init.isra.0.cold [amdgpu]] amdgpu: freesync_module init done 76c7b459. [2022-01-06 15:16:10] <7>[6.229677] [drm:amdgpu_dm_init.isra.0.cold [amdgpu]] amdgpu: hdcp_workqueue init done 87e28b47. [2022-01-06 15:16:10] <7>[6.229979] [drm:amdgpu_dm_init.isra.0.cold [amdgpu]] amdgpu_dm_connector_init() ... which we can see again several times later though the driver seems sufficient to finish init: [2022-01-06 15:16:10] <6>[6.615615] [drm] late_init of IP block ... [2022-01-06 15:16:10] <6>[6.615772] [drm] late_init of IP block ... [2022-01-06 15:16:10] <6>[6.615801] [drm] late_init of IP block ... [2022-01-06 15:16:10] <6>[6.615827] [drm] late_init of IP block ... [2022-01-06 15:16:10] <3>[6.801790] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3 [2022-01-06 15:16:10] <7>[6.806079] [drm:drm_minor_register [drm]] [2022-01-06 15:16:10] <7>[6.806195] [drm:drm_minor_register [drm]] new minor registered 128 [2022-01-06 15:16:10] <7>[6.806223] [drm:drm_minor_register [drm]] [2022-01-06 15:16:10] <7>[6.806289] [drm:drm_minor_register [drm]] new minor registered 0 [2022-01-06 15:16:10] <7>[6.806355] [drm:drm_sysfs_connector_add [drm]] adding "eDP-1" to sysfs [2022-01-06 15:16:10] <7>[6.806424] [drm:drm_dp_aux_register_devnode [drm_kms_helper]] drm_dp_aux_dev: aux [AMDGPU DM aux hw bus 0] registe
RE: [PATCH] drm/amdgpu: Use correct VIEWPORT_DIMENSION for DCN2
[Public] > -Original Message- > From: amd-gfx On Behalf Of > Harry Wentland > Sent: Wednesday, January 5, 2022 3:39 PM > To: amd-gfx@lists.freedesktop.org; Koenig, Christian > ; Lee, Becle ; Huang, > Rex ; Huang, Ray > Cc: Wentland, Harry > Subject: [PATCH] drm/amdgpu: Use correct VIEWPORT_DIMENSION for > DCN2 > > For some reason this file isn't using the appropriate register headers for DCN > headers, which means that on DCN2 we're getting the > VIEWPORT_DIMENSION offset wrong. > > This means that we're not correctly carving out the framebuffer memory > correctly for a framebuffer allocated by EFI and therefore see corruption > when loading amdgpu before the display driver takes over control of the > framebuffer scanout. > > Fix this by checking the DCE_HWIP and picking the correct offset accordingly. > > Long-term we should expose this info from DC as GMC shouldn't need to > know about DCN registers. > > Signed-off-by: Harry Wentland > --- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > index 57f2729a7bd0..8367ecf61af1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > @@ -72,6 +72,9 @@ > #define mmDCHUBBUB_SDPIF_MMIO_CNTRL_0 > 0x049d > #define mmDCHUBBUB_SDPIF_MMIO_CNTRL_0_BASE_IDX > 2 > > +#define DCN2_mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION > 0x05ea > +#define > DCN2_mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION_BASE_IDX > 2 > + We normally append the _DCN2 to the end in these cases of the macro. With that fixed, Reviewed-by: Alex Deucher > > static const char *gfxhub_client_ids[] = { > "CB", > @@ -1142,7 +1145,6 @@ static unsigned > gmc_v9_0_get_vbios_fb_size(struct amdgpu_device *adev) > switch (adev->ip_versions[DCE_HWIP][0]) { > case IP_VERSION(1, 0, 0): > case IP_VERSION(1, 0, 1): > - case IP_VERSION(2, 1, 0): > viewport = RREG32_SOC15(DCE, 0, > mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION); > size = (REG_GET_FIELD(viewport, > > HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_HEIGHT) * > @@ -1150,6 +1152,14 @@ static unsigned > gmc_v9_0_get_vbios_fb_size(struct amdgpu_device *adev) > > HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_WIDTH) * > 4); > break; > + case IP_VERSION(2, 1, 0): > + viewport = RREG32_SOC15(DCE, 0, > DCN2_mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION); > + size = (REG_GET_FIELD(viewport, > + > HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_HEIGHT) * > + REG_GET_FIELD(viewport, > + > HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_WIDTH) * > + 4); > + break; > default: > viewport = RREG32_SOC15(DCE, 0, > mmSCL0_VIEWPORT_SIZE); > size = (REG_GET_FIELD(viewport, > SCL0_VIEWPORT_SIZE, VIEWPORT_HEIGHT) * > -- > 2.34.1
RE: [PATCH v2] drm/amd/display: explicitly update clocks when DC is set to D3 in s0i3
[Public] > -Original Message- > From: Thorsten Leemhuis > Sent: Thursday, January 6, 2022 05:39 > To: Limonciello, Mario ; amd- > g...@lists.freedesktop.org > Cc: Zhuo, Qingqing (Lillian) ; Scott Bruce > ; Chris Hixon ; > spassw...@web.de > Subject: Re: [PATCH v2] drm/amd/display: explicitly update clocks when DC is > set > to D3 in s0i3 > > Hi, this is your Linux kernel regression tracker speaking. > > On 05.01.22 18:06, Mario Limonciello wrote: > > The WA from commit 5965280abd30 ("drm/amd/display: Apply w/a for > > hard hang on HPD") causes a regression in s0ix where the system will > > fail to resume properly. This may be because an HPD was active the last > > time clocks were updated but clocks didn't get updated again during s0ix. > > > > So add an extra call to update clocks as part of the suspend routine: > > dm_suspend->dc_set_power_state->clk_mgr_optimize_pwr_state > > > > In case HPD is set during this time, also check if the call happened during > > suspend to allow overriding the WA. > > Thx for this. Our of interest: is that something that should still go > into v5.16? Not in v2. There were review comments that led to creating a V3. Once V3 is reviewed if there is time remaining, yes. If not, then it will be a great candidate for 5.16.1. > > > Cc: Qingqing Zhuo > > Reported-by: Scott Bruce > > Reported-by: Chris Hixon > > Reported-by: spassw...@web.de > > BugLink: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla. > kernel.org%2Fshow_bug.cgi%3Fid%3D215436&data=04%7C01%7Cmario.li > monciello%40amd.com%7C024b889dd8af440abbb008d9d1091f27%7C3dd8961f > e4884e608e11a82d994e183d%7C0%7C0%7C637770660416166334%7CUnknow > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW > wiLCJXVCI6Mn0%3D%7C3000&sdata=rzNDgWba05cL5Z6CTvlblJS96R%2F1 > 8Vh7ku0S%2FQTRbHQ%3D&reserved=0 > > BugLink: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fr > eedesktop.org%2Fdrm%2Famd%2F- > %2Fissues%2F1821&data=04%7C01%7Cmario.limonciello%40amd.com%7 > C024b889dd8af440abbb008d9d1091f27%7C3dd8961fe4884e608e11a82d994e1 > 83d%7C0%7C0%7C637770660416166334%7CUnknown%7CTWFpbGZsb3d8eyJ > WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > 3000&sdata=TQW1vagoGW1DdNvsTVFKlA7gdJVvWhnxZU6BZPn3MH4%3D > &reserved=0 > > BugLink: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fr > eedesktop.org%2Fdrm%2Famd%2F- > %2Fissues%2F1852&data=04%7C01%7Cmario.limonciello%40amd.com%7 > C024b889dd8af440abbb008d9d1091f27%7C3dd8961fe4884e608e11a82d994e1 > 83d%7C0%7C0%7C637770660416166334%7CUnknown%7CTWFpbGZsb3d8eyJ > WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > 3000&sdata=N4C5PRnke4%2Bpswb3oAMhNsPq3AMLK5VuJG7Ht%2F1n0jk > %3D&reserved=0 > > What is "BugLink"? It's not mention in the kernel's documentation, which > tells people to use "Link:" for linking to bug reports: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ke > rnel.org%2Fdoc%2Fhtml%2Flatest%2Fprocess%2Fsubmitting- > patches.html&data=04%7C01%7Cmario.limonciello%40amd.com%7C024b > 889dd8af440abbb008d9d1091f27%7C3dd8961fe4884e608e11a82d994e183d%7 > C0%7C0%7C637770660416166334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC > 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&a > mp;sdata=GntDRw3WuCUZ%2Fys9uDAltqDs2zsBR4qQm3KdDA3VBKo%3D& > reserved=0 > > That is not new, bug recently was made more explicit. > > Hence, please consider changing them to "Link:", there are tools out > there that repy on them (I'm running such a tool, but there might be > others out there we are not aware of). Thanks for enlightening me. If I need to spin for v4 I'll fix it up on next submission, otherwise I'll fix it up v3 manually before it goes into amd-staging-drm-next. > > FWIW, in principe I agree that we need a seperate tag for this. But then > we should get this discussed and blessed through the usual channels. > That's why I recently proposed "Reported:", without much success so far: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker > nel.org%2Flkml%2Fc6724c7fb534a46e095e6aef13d996ed9589e578.163904296 > 6.git.linux%40leemhuis.info%2F&data=04%7C01%7Cmario.limonciello%40 > amd.com%7C024b889dd8af440abbb008d9d1091f27%7C3dd8961fe4884e608e1 > 1a82d994e183d%7C0%7C0%7C637770660416166334%7CUnknown%7CTWFpb > GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M > n0%3D%7C3000&sdata=oBGtPqXfzBcn%2FqUrH7hDQtMIXwIKBY6xh3Qqd0 > xkRjg%3D&reserved=0 At least some distributions kernel teams use BugLink (presumably from their own tools). > > Ciao, Thorsten > > > Fixes: 5965280abd30 ("drm/amd/display: Apply w/a for hard hang on HPD") > > Fixes: 1bd3bc745e7f ("drm/amd/display: Extend w/a for hard hang on HPD to > dcn20") > > Signed-off-by: Mario Limonciello > > --- > > changes from v1->v2: > > * Add fallthrough statement > > * Extend case to check if call was explicitly in s0ix since #1852 sho
[PATCH] drm/amd/display: explicitly set is_dsc_supported to false before use
When UBSAN is enabled a case is shown on unplugging the display that this variable hasn't been initialized by `update_dsc_caps`, presumably when the display was unplugged it wasn't copied from the DPCD. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956497 Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 5d20807b6f88..3d81314e6cb4 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6093,6 +6093,7 @@ static void update_dsc_caps(struct amdgpu_dm_connector *aconnector, struct dsc_dec_dpcd_caps *dsc_caps) { stream->timing.flags.DSC = 0; + dsc_caps->is_dsc_supported = false; if (aconnector->dc_link && (sink->sink_signal == SIGNAL_TYPE_DISPLAY_PORT || sink->sink_signal == SIGNAL_TYPE_EDP)) { -- 2.25.1
RE: [PATCH 2/7] drm/amd/pm: drop unneeded vcn/jpeg_gate_lock
[Public] err0_out: - mutex_unlock(&power_gate->jpeg_gate_lock); - mutex_unlock(&power_gate->vcn_gate_lock); - return ret; Is it better to refactor the code to drop error path of "err0_out"? Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Evan Quan Sent: Thursday, January 6, 2022 1:57 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Quan, Evan Subject: [PATCH 2/7] drm/amd/pm: drop unneeded vcn/jpeg_gate_lock As those related APIs are already protected by adev->pm.mutex. Signed-off-by: Evan Quan Change-Id: I762fab96bb1c034c153b029f939ec6e498460007 --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 56 +++ drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 2 - 2 files changed, 8 insertions(+), 50 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index ae91ef2078bf..ecbc768dfe2f 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -158,8 +158,8 @@ static u32 smu_get_sclk(void *handle, bool low) return clk_freq * 100; } -static int smu_dpm_set_vcn_enable_locked(struct smu_context *smu, -bool enable) +static int smu_dpm_set_vcn_enable(struct smu_context *smu, + bool enable) { struct smu_power_context *smu_power = &smu->smu_power; struct smu_power_gate *power_gate = &smu_power->power_gate; @@ -178,24 +178,8 @@ static int smu_dpm_set_vcn_enable_locked(struct smu_context *smu, return ret; } -static int smu_dpm_set_vcn_enable(struct smu_context *smu, - bool enable) -{ - struct smu_power_context *smu_power = &smu->smu_power; - struct smu_power_gate *power_gate = &smu_power->power_gate; - int ret = 0; - - mutex_lock(&power_gate->vcn_gate_lock); - - ret = smu_dpm_set_vcn_enable_locked(smu, enable); - - mutex_unlock(&power_gate->vcn_gate_lock); - - return ret; -} - -static int smu_dpm_set_jpeg_enable_locked(struct smu_context *smu, - bool enable) +static int smu_dpm_set_jpeg_enable(struct smu_context *smu, + bool enable) { struct smu_power_context *smu_power = &smu->smu_power; struct smu_power_gate *power_gate = &smu_power->power_gate; @@ -214,22 +198,6 @@ static int smu_dpm_set_jpeg_enable_locked(struct smu_context *smu, return ret; } -static int smu_dpm_set_jpeg_enable(struct smu_context *smu, - bool enable) -{ - struct smu_power_context *smu_power = &smu->smu_power; - struct smu_power_gate *power_gate = &smu_power->power_gate; - int ret = 0; - - mutex_lock(&power_gate->jpeg_gate_lock); - - ret = smu_dpm_set_jpeg_enable_locked(smu, enable); - - mutex_unlock(&power_gate->jpeg_gate_lock); - - return ret; -} - /** * smu_dpm_set_power_gate - power gate/ungate the specific IP block * @@ -619,17 +587,14 @@ static int smu_set_default_dpm_table(struct smu_context *smu) if (!smu->ppt_funcs->set_default_dpm_table) return 0; - mutex_lock(&power_gate->vcn_gate_lock); - mutex_lock(&power_gate->jpeg_gate_lock); - vcn_gate = atomic_read(&power_gate->vcn_gated); jpeg_gate = atomic_read(&power_gate->jpeg_gated); - ret = smu_dpm_set_vcn_enable_locked(smu, true); + ret = smu_dpm_set_vcn_enable(smu, true); if (ret) goto err0_out; - ret = smu_dpm_set_jpeg_enable_locked(smu, true); + ret = smu_dpm_set_jpeg_enable(smu, true); if (ret) goto err1_out; @@ -638,13 +603,10 @@ static int smu_set_default_dpm_table(struct smu_context *smu) dev_err(smu->adev->dev, "Failed to setup default dpm clock tables!\n"); - smu_dpm_set_jpeg_enable_locked(smu, !jpeg_gate); + smu_dpm_set_jpeg_enable(smu, !jpeg_gate); err1_out: - smu_dpm_set_vcn_enable_locked(smu, !vcn_gate); + smu_dpm_set_vcn_enable(smu, !vcn_gate); err0_out: - mutex_unlock(&power_gate->jpeg_gate_lock); - mutex_unlock(&power_gate->vcn_gate_lock); - return ret; } @@ -1006,8 +968,6 @@ static int smu_sw_init(void *handle) atomic_set(&smu->smu_power.power_gate.vcn_gated, 1); atomic_set(&smu->smu_power.power_gate.jpeg_gated, 1); - mutex_init(&smu->smu_power.power_gate.vcn_gate_lock); - mutex_init(&smu->smu_power.power_gate.jpeg_gate_lock); smu->workload_mask = 1 << smu->workload_prority[PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT]; smu->workload_prority[PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT] = 0; diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h index 00760f3c6da5..c3efe4fea5e0 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdg
[PATCH] Revert "i2c: core: support bus regulator controlling in adapter"
This largely reverts commit 5a7b95fb993ec399c8a685552aa6a8fc995c40bd. It breaks suspend with AMD GPUs, and we couldn't incrementally fix it. So, let's remove the code and go back to the drawing board. We keep the header extension to not break drivers already populating the regulator. We expect to re-add the code handling it soon. Reported-by: "Tareque Md.Hanif" Link: https://lore.kernel.org/r/1295184560.182511.1639075777...@mail.yahoo.com Reported-by: Konstantin Kharlamov Link: https://lore.kernel.org/r/7143a7147978f4104171072d9f5225d2ce355ec1.ca...@yandex.ru BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1850 Signed-off-by: Wolfram Sang --- So far, I tested it on a Renesas R-Car M3-N board verifying that I2C still works. I'll apply it to my for-next branch right away to get the buildbots involved as well. But I am still open for comments until I apply it to my for-current branch, probably tomorrow. drivers/i2c/i2c-core-base.c | 95 - 1 file changed, 95 deletions(-) diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c index f193f9058584..73253e667de1 100644 --- a/drivers/i2c/i2c-core-base.c +++ b/drivers/i2c/i2c-core-base.c @@ -466,14 +466,12 @@ static int i2c_smbus_host_notify_to_irq(const struct i2c_client *client) static int i2c_device_probe(struct device *dev) { struct i2c_client *client = i2c_verify_client(dev); - struct i2c_adapter *adap; struct i2c_driver *driver; int status; if (!client) return 0; - adap = client->adapter; client->irq = client->init_irq; if (!client->irq) { @@ -539,14 +537,6 @@ static int i2c_device_probe(struct device *dev) dev_dbg(dev, "probe\n"); - if (adap->bus_regulator) { - status = regulator_enable(adap->bus_regulator); - if (status < 0) { - dev_err(&adap->dev, "Failed to enable bus regulator\n"); - goto err_clear_wakeup_irq; - } - } - status = of_clk_set_defaults(dev->of_node, false); if (status < 0) goto err_clear_wakeup_irq; @@ -605,10 +595,8 @@ static int i2c_device_probe(struct device *dev) static void i2c_device_remove(struct device *dev) { struct i2c_client *client = to_i2c_client(dev); - struct i2c_adapter *adap; struct i2c_driver *driver; - adap = client->adapter; driver = to_i2c_driver(dev->driver); if (driver->remove) { int status; @@ -623,8 +611,6 @@ static void i2c_device_remove(struct device *dev) devres_release_group(&client->dev, client->devres_group_id); dev_pm_domain_detach(&client->dev, !i2c_acpi_waive_d0_probe(dev)); - if (!pm_runtime_status_suspended(&client->dev) && adap->bus_regulator) - regulator_disable(adap->bus_regulator); dev_pm_clear_wake_irq(&client->dev); device_init_wakeup(&client->dev, false); @@ -634,86 +620,6 @@ static void i2c_device_remove(struct device *dev) pm_runtime_put(&client->adapter->dev); } -#ifdef CONFIG_PM_SLEEP -static int i2c_resume_early(struct device *dev) -{ - struct i2c_client *client = i2c_verify_client(dev); - int err; - - if (!client) - return 0; - - if (pm_runtime_status_suspended(&client->dev) && - client->adapter->bus_regulator) { - err = regulator_enable(client->adapter->bus_regulator); - if (err) - return err; - } - - return pm_generic_resume_early(&client->dev); -} - -static int i2c_suspend_late(struct device *dev) -{ - struct i2c_client *client = i2c_verify_client(dev); - int err; - - if (!client) - return 0; - - err = pm_generic_suspend_late(&client->dev); - if (err) - return err; - - if (!pm_runtime_status_suspended(&client->dev) && - client->adapter->bus_regulator) - return regulator_disable(client->adapter->bus_regulator); - - return 0; -} -#endif - -#ifdef CONFIG_PM -static int i2c_runtime_resume(struct device *dev) -{ - struct i2c_client *client = i2c_verify_client(dev); - int err; - - if (!client) - return 0; - - if (client->adapter->bus_regulator) { - err = regulator_enable(client->adapter->bus_regulator); - if (err) - return err; - } - - return pm_generic_runtime_resume(&client->dev); -} - -static int i2c_runtime_suspend(struct device *dev) -{ - struct i2c_client *client = i2c_verify_client(dev); - int err; - - if (!client) - return 0; - - err = pm_generic_runtime_suspend(&client->dev); - if (err) - return err; - - if (client->adapter->bus_regulator) - retu
Re: [PATCH v4 2/6] drm: improve drm_buddy_alloc function
Hi Am 03.01.22 um 08:41 schrieb Christian König: Am 26.12.21 um 21:59 schrieb Arunpravin: Hi Thomas On 16/12/21 5:05 pm, Thomas Zimmermann wrote: Hi Am 01.12.21 um 17:39 schrieb Arunpravin: - Make drm_buddy_alloc a single function to handle range allocation and non-range allocation demands - Implemented a new function alloc_range() which allocates the requested power-of-two block comply with range limitations - Moved order computation and memory alignment logic from i915 driver to drm buddy v2: merged below changes to keep the build unbroken - drm_buddy_alloc_range() becomes obsolete and may be removed - enable ttm range allocation (fpfn / lpfn) support in i915 driver - apply enhanced drm_buddy_alloc() function to i915 driver v3(Matthew Auld): - Fix alignment issues and remove unnecessary list_empty check - add more validation checks for input arguments - make alloc_range() block allocations as bottom-up - optimize order computation logic - replace uint64_t with u64, which is preferred in the kernel v4(Matthew Auld): - keep drm_buddy_alloc_range() function implementation for generic actual range allocations - keep alloc_range() implementation for end bias allocations Signed-off-by: Arunpravin +#define DRM_BUDDY_RANGE_ALLOCATION (1 << 0) + struct drm_buddy_block { #define DRM_BUDDY_HEADER_OFFSET GENMASK_ULL(63, 12) #define DRM_BUDDY_HEADER_STATE GENMASK_ULL(11, 10) @@ -132,12 +139,11 @@ int drm_buddy_init(struct drm_buddy_mm *mm, u64 size, u64 chunk_size); void drm_buddy_fini(struct drm_buddy_mm *mm); -struct drm_buddy_block * -drm_buddy_alloc(struct drm_buddy_mm *mm, unsigned int order); Just a style issue. The structure is called drm_buddy_mm. For consistency, I like to suggest to name all the public interfaces and defines drm_buddy_mm_* instead of just drm_buddy_*. Thanks for the suggestion, I think renaming drm_buddy_* to drm_buddy_mm_* creates a long name for the public interfaces, for instance - drm_buddy_mm_alloc_blocks(), discussing the style issue internally @Matthew, @christian - please let me know your opinion I would prefer drm_buddy as prefix as well and I think we could rather drop the _mm postfix from the structure here. I was mostly concerned about mismatching names for functions and structures. If drm_buddy_ (without mm) for all naming is preferred, I wouldn't mind. Best regards Thomas Cause what we try to manage is not necessary memory, but rather address space. Christian. - -int drm_buddy_alloc_range(struct drm_buddy_mm *mm, - struct list_head *blocks, - u64 start, u64 size); +int drm_buddy_alloc(struct drm_buddy_mm *mm, + u64 start, u64 end, u64 size, + u64 min_page_size, + struct list_head *blocks, + unsigned long flags); void drm_buddy_free(struct drm_buddy_mm *mm, struct drm_buddy_block *block); I'd call those *_alloc_blocks() and _free_blocks(). Right now it sounds as if they allocate and free instances of drm_buddy_mm. can we call those drm_buddy_alloc_blocks() and drm_buddy_free_blocks() Does this make sense? Best regards Thomas -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage
[AMD Official Use Only] Via ras_ctrl sys node one uncorrectable error injection on Sienna Cichlid, two interrupts will be triggered. I was informed the two interrupts are as expected since when error address is not 64byte aligned, one 64Byte SDP request will be split to two 32Byte request in UMC and sent to dram Then the second interrupt handling will read the garbage data in err_data. And the consequence is that ue counter increased by 2, and page at 0x0 address will be saved unexpectedly. Best regards, Jiawei -Original Message- From: Zhou1, Tao Sent: Thursday, January 6, 2022 6:05 PM To: Gu, JiaWei (Will) ; amd-gfx@lists.freedesktop.org; Clements, John ; Yang, Stanley ; Deng, Emily Cc: Gu, JiaWei (Will) Subject: RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage [AMD Official Use Only] Reviewed-by: Tao Zhou May I know how do you reproduce the issue? > -Original Message- > From: amd-gfx On Behalf Of > Jiawei Gu > Sent: Thursday, January 6, 2022 5:17 PM > To: amd-gfx@lists.freedesktop.org; Clements, John > ; Yang, Stanley ; Deng, > Emily > Cc: Gu, JiaWei (Will) > Subject: [PATCH] drm/amdgpu: Clear garbage data in err_data before > usage > > Memory of err_data should be cleaned before usage when there're > multiple entry in ras ih. > Otherwise garbage data from last loop will be used. > > Signed-off-by: Jiawei Gu > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 31bad1a20ed0..3f5bf5780ebf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct > ras_manager *obj) > /* Let IP handle its data, maybe we need get > the output >* from the callback to udpate the error > type/count, etc >*/ > + memset(&err_data, 0, sizeof(err_data)); > ret = data->cb(obj->adev, &err_data, &entry); > /* ue will trigger an interrupt, and in that > case >* we need do a reset to recovery the whole > system. > -- > 2.17.1
RE: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage
[AMD Official Use Only] Reviewed-by: Tao Zhou May I know how do you reproduce the issue? > -Original Message- > From: amd-gfx On Behalf Of Jiawei > Gu > Sent: Thursday, January 6, 2022 5:17 PM > To: amd-gfx@lists.freedesktop.org; Clements, John > ; Yang, Stanley ; Deng, > Emily > Cc: Gu, JiaWei (Will) > Subject: [PATCH] drm/amdgpu: Clear garbage data in err_data before usage > > Memory of err_data should be cleaned before usage when there're multiple > entry in ras ih. > Otherwise garbage data from last loop will be used. > > Signed-off-by: Jiawei Gu > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 31bad1a20ed0..3f5bf5780ebf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct > ras_manager *obj) > /* Let IP handle its data, maybe we need get > the output >* from the callback to udpate the error > type/count, etc >*/ > + memset(&err_data, 0, sizeof(err_data)); > ret = data->cb(obj->adev, &err_data, &entry); > /* ue will trigger an interrupt, and in that > case >* we need do a reset to recovery the whole > system. > -- > 2.17.1
Re: [PATCH 1/2] fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb
Hi Am 27.12.21 um 19:25 schrieb Alex Deucher: Add a function for drivers to check if the a firmware initialized fb is corresponds to their aperture. This allows drivers to check if the device corresponds to what the firmware set up as the display device. If simpledrm is in use, it will register via DRM aperture helpers. You probably need a similar function in drm_aperture.c to handle this. Best regards Thomas Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215203 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1840 Signed-off-by: Alex Deucher --- drivers/video/fbdev/core/fbmem.c | 47 include/linux/fb.h | 1 + 2 files changed, 48 insertions(+) diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index 826175ad88a2..0fa7ede94fa6 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1762,6 +1762,53 @@ int remove_conflicting_framebuffers(struct apertures_struct *a, } EXPORT_SYMBOL(remove_conflicting_framebuffers); +/** + * is_firmware_framebuffer - detect if firmware-configured framebuffer matches + * @a: memory range, users of which are to be checked + * + * This function checks framebuffer devices (initialized by firmware/bootloader) + * which use memory range described by @a. If @a matchesm the function returns + * true, otherwise false. + */ +bool is_firmware_framebuffer(struct apertures_struct *a) +{ + bool do_free = false; + bool found = false; + int i; + + if (!a) { + a = alloc_apertures(1); + if (!a) + return false; + + a->ranges[0].base = 0; + a->ranges[0].size = ~0; + do_free = true; + } + + mutex_lock(®istration_lock); + /* check all firmware fbs and kick off if the base addr overlaps */ + for_each_registered_fb(i) { + struct apertures_struct *gen_aper; + + if (!(registered_fb[i]->flags & FBINFO_MISC_FIRMWARE)) + continue; + + gen_aper = registered_fb[i]->apertures; + if (fb_do_apertures_overlap(gen_aper, a)) { + found = true; + break; + } + } + mutex_unlock(®istration_lock); + + if (do_free) + kfree(a); + + return found; +} +EXPORT_SYMBOL(is_firmware_framebuffer); + /** * remove_conflicting_pci_framebuffers - remove firmware-configured framebuffers for PCI devices * @pdev: PCI device diff --git a/include/linux/fb.h b/include/linux/fb.h index 6f3db99ab990..3da95842b207 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -610,6 +610,7 @@ extern int remove_conflicting_pci_framebuffers(struct pci_dev *pdev, const char *name); extern int remove_conflicting_framebuffers(struct apertures_struct *a, const char *name, bool primary); +extern bool is_firmware_framebuffer(struct apertures_struct *a); extern int fb_prepare_logo(struct fb_info *fb_info, int rotate); extern int fb_show_logo(struct fb_info *fb_info, int rotate); extern char* fb_get_buffer_offset(struct fb_info *info, struct fb_pixmap *buf, u32 size); -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: Bug: amdgpu stutter and spam `Fence fallback timer expired` after suspend
> Should I send a revert, or is there a way to fix this? You are the second one to report this problem and so far, there has been no response from the authors. I will prepare a revert and CC all involved people. Thank you for the nagging! signature.asc Description: PGP signature
Re: [PATCH] drm/amdgpu: Use correct VIEWPORT_DIMENSION for DCN2
Am 05.01.22 um 21:39 schrieb Harry Wentland: For some reason this file isn't using the appropriate register headers for DCN headers, which means that on DCN2 we're getting the VIEWPORT_DIMENSION offset wrong. This means that we're not correctly carving out the framebuffer memory correctly for a framebuffer allocated by EFI and therefore see corruption when loading amdgpu before the display driver takes over control of the framebuffer scanout. Fix this by checking the DCE_HWIP and picking the correct offset accordingly. Long-term we should expose this info from DC as GMC shouldn't need to know about DCN registers. Signed-off-by: Harry Wentland Please add a TODO comment that in the long run that code should be moved into DC. With that done the patch is Acked-by: Christian König . Thanks, Christian. --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 57f2729a7bd0..8367ecf61af1 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -72,6 +72,9 @@ #define mmDCHUBBUB_SDPIF_MMIO_CNTRL_0 0x049d #define mmDCHUBBUB_SDPIF_MMIO_CNTRL_0_BASE_IDX 2 +#define DCN2_mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION 0x05ea +#define DCN2_mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION_BASE_IDX 2 + static const char *gfxhub_client_ids[] = { "CB", @@ -1142,7 +1145,6 @@ static unsigned gmc_v9_0_get_vbios_fb_size(struct amdgpu_device *adev) switch (adev->ip_versions[DCE_HWIP][0]) { case IP_VERSION(1, 0, 0): case IP_VERSION(1, 0, 1): - case IP_VERSION(2, 1, 0): viewport = RREG32_SOC15(DCE, 0, mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION); size = (REG_GET_FIELD(viewport, HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_HEIGHT) * @@ -1150,6 +1152,14 @@ static unsigned gmc_v9_0_get_vbios_fb_size(struct amdgpu_device *adev) HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_WIDTH) * 4); break; + case IP_VERSION(2, 1, 0): + viewport = RREG32_SOC15(DCE, 0, DCN2_mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION); + size = (REG_GET_FIELD(viewport, + HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_HEIGHT) * + REG_GET_FIELD(viewport, + HUBP0_DCSURF_PRI_VIEWPORT_DIMENSION, PRI_VIEWPORT_WIDTH) * + 4); + break; default: viewport = RREG32_SOC15(DCE, 0, mmSCL0_VIEWPORT_SIZE); size = (REG_GET_FIELD(viewport, SCL0_VIEWPORT_SIZE, VIEWPORT_HEIGHT) *
[PATCH] drm/amdgpu: Clear garbage data in err_data before usage
Memory of err_data should be cleaned before usage when there're multiple entry in ras ih. Otherwise garbage data from last loop will be used. Signed-off-by: Jiawei Gu --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 31bad1a20ed0..3f5bf5780ebf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1592,6 +1592,7 @@ static void amdgpu_ras_interrupt_handler(struct ras_manager *obj) /* Let IP handle its data, maybe we need get the output * from the callback to udpate the error type/count, etc */ + memset(&err_data, 0, sizeof(err_data)); ret = data->cb(obj->adev, &err_data, &entry); /* ue will trigger an interrupt, and in that case * we need do a reset to recovery the whole system. -- 2.17.1
Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV
Am 06.01.22 um 06:18 schrieb JingWen Chen: On 2022/1/6 下午12:59, JingWen Chen wrote: On 2022/1/6 上午2:24, Andrey Grodzovsky wrote: On 2022-01-05 2:59 a.m., Christian König wrote: Am 05.01.22 um 08:34 schrieb JingWen Chen: On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: On 2022-01-04 6:36 a.m., Christian König wrote: Am 04.01.22 um 11:49 schrieb Liu, Monk: [AMD Official Use Only] See the FLR request from the hypervisor is just another source of signaling the need for a reset, similar to each job timeout on each queue. Otherwise you have a race condition between the hypervisor and the scheduler. No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is about to start or was already executed, but host will do FLR anyway without waiting for guest too long Then we have a major design issue in the SRIOV protocol and really need to question this. How do you want to prevent a race between the hypervisor resetting the hardware and the client trying the same because of a timeout? As far as I can see the procedure should be: 1. We detect that a reset is necessary, either because of a fault a timeout or signal from hypervisor. 2. For each of those potential reset sources a work item is send to the single workqueue. 3. One of those work items execute first and prepares the reset. 4. We either do the reset our self or notify the hypervisor that we are ready for the reset. 5. Cleanup after the reset, eventually resubmit jobs etc.. 6. Cancel work items which might have been scheduled from other reset sources. It does make sense that the hypervisor resets the hardware without waiting for the clients for too long, but if we don't follow this general steps we will always have a race between the different components. Monk, just to add to this - if indeed as you say that 'FLR from hypervisor is just to notify guest the hw VF FLR is about to start or was already executed, but host will do FLR anyway without waiting for guest too long' and there is no strict waiting from the hypervisor for IDH_READY_TO_RESET to be recived from guest before starting the reset then setting in_gpu_reset and locking reset_sem from guest side is not really full proof protection from MMIO accesses by the guest - it only truly helps if hypervisor waits for that message before initiation of HW reset. Hi Andrey, this cannot be done. If somehow guest kernel hangs and never has the chance to send the response back, then other VFs will have to wait it reset. All the vfs will hang in this case. Or sometimes the mailbox has some delay and other VFs will also wait. The user of other VFs will be affected in this case. Yeah, agree completely with JingWen. The hypervisor is the one in charge here, not the guest. What the hypervisor should do (and it already seems to be designed that way) is to send the guest a message that a reset is about to happen and give it some time to response appropriately. The guest on the other hand then tells the hypervisor that all processing has stopped and it is ready to restart. If that doesn't happen in time the hypervisor should eliminate the guest probably trigger even more severe consequences, e.g. restart the whole VM etc... Christian. So what's the end conclusion here regarding dropping this particular patch ? Seems to me we still need to drop it to prevent driver's MMIO access to the GPU during reset from various places in the code. Andrey Hi Andrey & Christian, I have ported your patch(drop the reset_sem and in_gpu_reset in flr work) and run some tests. If a engine hang during an OCL benchmark(using kfd), we can see the logs below: [ 397.190727] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.301496] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.406601] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.532343] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.642251] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.746634] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.850761] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 397.960544] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.065218] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.182173] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.288264] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 398.394712] amdgpu :00:07.0: amdgpu: wait for kiq fence error: 0. [ 428.400582] [drm] clean up the vf2pf work item [ 428.500528] amdgpu :00:07.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:8 pasid:32771, for process xgemmStandalone pid 3557 thread xgemmStandalone pid 3557) [ 428.527576] amdgpu :00:07.0: amdgpu: in page starting at address 0x7fc991c04000 from client 0x1b (UTCL2) [ 437.531392] amdgpu: qcm fence wait loop timeout expired [ 437.535738] amdgpu: The cp might be in an unrecoverable state due to
Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process
Am 05.01.22 um 17:16 schrieb Felix Kuehling: [SNIP] But KFD doesn't know anything about the inherited BOs from the parent process. Ok, why that? When the KFD is reinitializing it's context why shouldn't it cleanup those VMAs? That cleanup has to be initiated by user mode. Basically closing the old KFD and DRM file descriptors, cleaning up all the user mode VM state, unmapping all the VMAs, etc. Then it reopens KFD and the render nodes and starts from scratch. User mode will do this automatically when it tries to reinitialize ROCm. However, in this case the child process doesn't do that (e.g. a python application using the multi-processing package). The child process does not use ROCm. But you're left with all the dangling VMAs in the child process indefinitely. Oh, not that one again. I'm unfortunately pretty sure that this is an clear NAK then. This python multi-processing package is violating various specifications by doing this fork() and we already had multiple discussions about that. Let's talk about this on Mondays call. Thanks for giving the whole context. Regards, Christian. Regards, Felix
Re: [PATCH 3/7] drm/amd/pm: drop unneeded smu->metrics_lock
On 1/6/2022 12:32 PM, Quan, Evan wrote: [AMD Official Use Only] -Original Message- From: Lazar, Lijo Sent: Thursday, January 6, 2022 2:17 PM To: Quan, Evan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re: [PATCH 3/7] drm/amd/pm: drop unneeded smu->metrics_lock On 1/6/2022 11:27 AM, Evan Quan wrote: As all those related APIs are already well protected by adev->pm.mutex and smu->message_lock. This one may be widely used. Instead of relying on pm.mutex it's better to keep metrics lock so that multiple clients can read data without waiting on other APIs that use pm.mutex. [Quan, Evan] If I understand it correctly, what you wanted to express is to use fine-grained lock instead of cross-grained one to avoid chasing for the same lock. Yes, that was what we did before and that's why we have so many types of locks. Below are my considerations for this: 1. We actually do not have such issue that many APIs/clients chasing for the same lock. Thus fine-grained locks cannot bring much benefits. Take the metrics_lock here for example. The data protected by metrics_lock are for those pm sysfs interfaces. Those sysfs interface are not so frequently called. And almost all the time, they are called one after one. So, it's rarely they will chase for the same lock. It's not just sysfs, there are other interfaces like sensors, hwmons etc. Basically, metrics table provides data like GFX activity or throttler status that may be continuously monitored by app layer. So other APIs could suffer. My thought is to just keep metrics under a separate lock and not tie with pm.mutex. Thanks, Lijo 2. Cross-grained lock can simplify our implementations. It's hard to believe, there is 10+(actually 13 as I counted) different types of locks used in our existing power code. By the cross-grained lock, we can simplify the code and protect us from some unintentional dead-locks(I actually run into that several times and it's really tricky). BR Evan Thanks, Lijo Signed-off-by: Evan Quan Change-Id: Ic75326ba7b4b67be8762d5407d02f6c514e1ad35 --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 1 - drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 1 - .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 14 +-- .../amd/pm/swsmu/smu11/cyan_skillfish_ppt.c | 10 +- .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 112 + - .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 27 ++--- .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 28 ++--- .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c | 14 +-- .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 23 ++-- .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c | 10 +- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 21 +--- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h| 4 - 12 files changed, 70 insertions(+), 195 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index ecbc768dfe2f..f0136bf36533 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -956,7 +956,6 @@ static int smu_sw_init(void *handle) bitmap_zero(smu->smu_feature.allowed, SMU_FEATURE_MAX); mutex_init(&smu->sensor_lock); - mutex_init(&smu->metrics_lock); mutex_init(&smu->message_lock); INIT_WORK(&smu->throttling_logging_work, smu_throttling_logging_work_fn); diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h index c3efe4fea5e0..63ed807c96f5 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h @@ -487,7 +487,6 @@ struct smu_context const struct cmn2asic_mapping *pwr_src_map; const struct cmn2asic_mapping *workload_map; struct mutexsensor_lock; - struct mutexmetrics_lock; struct mutexmessage_lock; uint64_t pool_size; diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c index addb0472d040..3f7c1f23475b 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c @@ -602,15 +602,11 @@ static int arcturus_get_smu_metrics_data(struct smu_context *smu, SmuMetrics_t *metrics = (SmuMetrics_t *)smu_table- metrics_table; int ret = 0; - mutex_lock(&smu->metrics_lock); - - ret = smu_cmn_get_metrics_table_locked(smu, - NULL, - false); - if (ret) { - mutex_unlock(&smu->metrics_lock); + ret = smu_cmn_get_metrics_table(smu, + NULL, + false); + if (ret) return ret; - } switch (member) { case MET
[PATCH] drm/amdkfd: Check for null pointer after calling kmemdup
As the possible failure of the allocation, kmemdup() may return NULL pointer. Therefore, it should be better to check the 'props2' in order to prevent the dereference of NULL pointer. Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs") Signed-off-by: Jiasheng Jiang --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index c60e82697385..d15380c65c6d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -410,6 +410,9 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink, return -ENODEV; /* same everything but the other direction */ props2 = kmemdup(props, sizeof(*props2), GFP_KERNEL); + if (!props2) + return -ENOMEM; + props2->node_from = id_to; props2->node_to = id_from; props2->kobj = NULL; -- 2.25.1
Re: Bug: amdgpu stutter and spam `Fence fallback timer expired` after suspend
Btw, please keep me on CC as I am not subscribed to the lists On Thu, 2022-01-06 at 00:26 +0300, Konstantin Kharlamov wrote: > Hello! Just a heads up to this report > https://gitlab.freedesktop.org/drm/amd/-/issues/1850 > > Basically, after waking up on suspend there're freezes in apps using amdgpu, > and there's a spam in syslog `Fence fallback timer expired on ring …`. > > I bisected that to a commit > > 5a7b95fb993ec399c8a685552aa6a8fc995c40bd i2c: core: support bus > regulator controlling in adapter > > Since the commit is not in AMDGPU, I send this email to linux-i2c. The commit > author CCed as well. > > Should I send a revert, or is there a way to fix this?
Bug: amdgpu stutter and spam `Fence fallback timer expired` after suspend
Hello! Just a heads up to this report https://gitlab.freedesktop.org/drm/amd/-/issues/1850 Basically, after waking up on suspend there're freezes in apps using amdgpu, and there's a spam in syslog `Fence fallback timer expired on ring …`. I bisected that to a commit 5a7b95fb993ec399c8a685552aa6a8fc995c40bd i2c: core: support bus regulator controlling in adapter Since the commit is not in AMDGPU, I send this email to linux-i2c. The commit author CCed as well. Should I send a revert, or is there a way to fix this?