On 12/24/2024 11:10, Daniele Ceraolo Spurio wrote:
On 12/24/2024 10:13 AM, John Harrison wrote:
On 12/23/2024 15:20, Daniele Ceraolo Spurio wrote:
On 12/20/2024 5:19 PM, [email protected] wrote:
From: John Harrison<[email protected]>
Add debug info to help investigate a very rare bug:
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13385
Signed-off-by: John Harrison<[email protected]>
---
drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
index b67a15f742762..868195c33f5b3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
@@ -7,6 +7,7 @@
#include "gt/intel_hwconfig.h"
#include "i915_drv.h"
#include "i915_memcpy.h"
+#include "intel_guc_print.h"
/*
* GuC has a blob containing hardware configuration information (HWConfig).
@@ -42,6 +43,8 @@ static int __guc_action_get_hwconfig(struct intel_guc *guc,
};
int ret;
+ guc_dbg(guc, "Querying HW config table: size = %d, offset = 0x%08X\n",
+ ggtt_size, ggtt_offset);
This seems to result in a double-log where the first print has no
useful information, e.g.:
[drm:__guc_action_get_hwconfig [i915]] GT0: GUC: Querying HW config
table: size = 0, offset = 0x00000000
[drm:__guc_action_get_hwconfig [i915]] GT0: GUC: Querying HW config
table: size = 752, offset = 0x00D05000
Given that only the second log is useful, IMO better to move the
guc_dbg to guc_hwconfig_fill_buffer(), because the info needed for
the second print is available there and it is only called once.
I disagree that the first print has no useful information. It tells
us that a call is being made and these are the parameters. We do not
know what the failure is. It seems highly unlikely that the size
changes from query to the next given that the table is a fixed
entity. It is much more likely to be a caching type issue with GuC
reading data the KMD did not write. If so, GuC could potentially read
non-zero data for the initial size query and complain that data is
invalid.
The intention is to report all calls with their parameters to try to
narrow down exactly what calls are not working.
But we don't need both prints to know which of the 2 calls has failed,
if the error comes before we get the second print then we know the
failure was in the first call, otherwise it was in the second.
Is it really such a problem to have two lines of debug print instead of
one? Given that this is a 'this cannot possibly happen' type bug, I
would much rather have totally explicit debug at the point of operation
rather than something further back that requires making assumptions
about what is or is not happening.
John.
Daniele
John.
Daniele
ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
if (ret == -ENXIO)
return -ENOENT;