On 2/25/2022 22:27, Alan Previn wrote:
Fix our pointer offset usage in error_state_read
when there is no i915_gpu_coredump but buf offset
is non-zero.

This fixes a kernel page fault can happen when
multiple tests are running concurrently in a loop
and one is producing engine resets and consuming
the i915 error_state dump while the other is
forcing full GT resets. (takes a while to trigger).
Does need a fixes tag given that it is fixing a bug in an earlier patch. Especially when it is a kernel BUG.
E.g.:
13:23> dim fixes 0e39037b31655
Fixes: 0e39037b3165 ("drm/i915: Cache the error string")
Cc: Jason Ekstrand <ja...@jlekstrand.net>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahti...@linux.intel.com>
Cc: Jani Nikula <jani.nik...@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.v...@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <sta...@vger.kernel.org> # v5.0+


The dmesg call trace:

5014 [ 5590.803000] BUG: unable to handle page fault for address: 
ffffffffa0b0e000
5015 [ 5590.803009] #PF: supervisor read access in kernel mode
5016 [ 5590.803013] #PF: error_code(0x0000) - not-present page
5017 [ 5590.803016] PGD 5814067 P4D 5814067 PUD 5815063 PMD 109de4067 PTE 0
5018 [ 5590.803022] Oops: 0000 [#1] PREEMPT SMP NOPTI
5019 [ 5590.803026] CPU: 5 PID: 13656 Comm: i915_hangman Tainted: G U 
5.17.0-rc5-
                        ups69-guc-err-capt-rev6+ #136
5020 [ 5590.803033] Hardware name: Intel Corporation Alder Lake Client Platform/
                        AlderLake-M LP4x RVP, BIOS 
ADLPFWI1.R00.3031.A02.2201171222
                        01/17/2022
5021 [ 5590.803039] RIP: 0010:memcpy_erms+0x6/0x10
5022 [ 5590.803045] Code: fe ff ff cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 
03 83
                        e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 
f8 48 89
                        d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 
72 7e 40 38 fe
5023 [ 5590.803054] RSP: 0018:ffffc90003a8fdf0 EFLAGS: 00010282
5024 [ 5590.803057] RAX: ffff888107ee9000 RBX: ffff888108cb1a00 RCX: 
0000000000000f8f
5025 [ 5590.803061] RDX: 0000000000001000 RSI: ffffffffa0b0e000 RDI: 
ffff888107ee9071
5026 [ 5590.803065] RBP: 0000000000000000 R08: 0000000000000001 R09: 
0000000000000001
5027 [ 5590.803069] R10: 0000000000000001 R11: 0000000000000002 R12: 
0000000000000019
5028 [ 5590.803073] R13: 0000000000174fff R14: 0000000000001000 R15: 
ffff888107ee9000
5029 [ 5590.803077] FS: 00007f62a99bee80(0000) GS:ffff88849f880000(0000) 
knlGS:0000000000000000
5030 [ 5590.803082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
5031 [ 5590.803085] CR2: ffffffffa0b0e000 CR3: 000000010a1a8004 CR4: 
0000000000770ee0
5032 [ 5590.803089] PKRU: 55555554
5033 [ 5590.803091] Call Trace:
5034 [ 5590.803093] <TASK>
5035 [ 5590.803096] error_state_read+0xa1/0xd0 [i915]
5036 [ 5590.803175] kernfs_fop_read_iter+0xb2/0x1b0
5037 [ 5590.803180] new_sync_read+0x116/0x1a0
5038 [ 5590.803185] vfs_read+0x114/0x1b0
5039 [ 5590.803189] ksys_read+0x63/0xe0
5040 [ 5590.803193] do_syscall_64+0x38/0xc0
5041 [ 5590.803197] entry_SYSCALL_64_after_hwframe+0x44/0xae
5042 [ 5590.803201] RIP: 0033:0x7f62aaea5912
5043 [ 5590.803204] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a b9 0c 00 e8 05 19 02 
00 0f 1f 44 00
                        00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 
<48> 3d 00 f0
                        ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
5044 [ 5590.803213] RSP: 002b:00007fff5b659ae8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
5045 [ 5590.803218] RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 
00007f62aaea5912
5046 [ 5590.803221] RDX: 000000000008b000 RSI: 00007f62a8c4000f RDI: 
0000000000000006
5047 [ 5590.803225] RBP: 00007f62a8bcb00f R08: 0000000000200010 R09: 
0000000000101000
5048 [ 5590.803229] R10: 0000000000000001 R11: 0000000000000246 R12: 
0000000000000006
5049 [ 5590.803233] R13: 0000000000075000 R14: 00007f62a8acb010 R15: 
0000000000200000
5050 [ 5590.803238] </TASK>
5051 [ 5590.803240] Modules linked in: i915 ttm drm_buddy drm_dp_helper 
drm_kms_helper syscopyarea
                        sysfillrect sysimgblt fb_sys_fops prime_numbers 
nfnetlink br_netfilter overlay
                        mei_pxp mei_hdcp x86_pkg_temp_thermal coretemp 
kvm_intel snd_hda_codec_hdmi
                        snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep 
snd_hda_core snd_pcm mei_me
                        mei fuse ip_tables x_tables crct10dif_pclmul e1000e 
crc32_pclmul ptp i2c_i801
                        ghash_clmulni_intel i2c_smbus pps_core [last unloa ded: 
ttm]
5052 [ 5590.803277] CR2: ffffffffa0b0e000
5053 [ 5590.803280] ---[ end trace 0000000000000000 ]---

Signed-off-by: Alan Previn <alan.previn.teres.ale...@intel.com>
---
  drivers/gpu/drm/i915/i915_sysfs.c | 8 +++++---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
b/drivers/gpu/drm/i915/i915_sysfs.c
index a4d1759375b9..c40e51298066 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file *filp, struct 
kobject *kobj,
        struct device *kdev = kobj_to_dev(kobj);
        struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
        struct i915_gpu_coredump *gpu;
-       ssize_t ret;
+       ssize_t ret = 0;
gpu = i915_first_error_state(i915);
        if (IS_ERR(gpu)) {
@@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file *filp, struct 
kobject *kobj,
                const char *str = "No error state collected\n";
                size_t len = strlen(str);
- ret = min_t(size_t, count, len - off);
-               memcpy(buf, str + off, ret);
+               if (off < len) {
+                       ret = min_t(size_t, count, len - off);
+                       memcpy(buf, str + off, ret);
+               }
So the problem is that the error dump disappeared while it was being read? That seems like it cause more problems than just this out-of-range access. E.g. what if the dump was freed and a new one created? The entity doing the partial reads would end up with half of one dump and half of the next.

I think we should at least add a FIXME comment to the code that fast recycling dumps could cause inconsistent sysfs reads.

I guess you could do something like add a unique id to the gpu coredump structure. Then, when a partial sysfs read occurs starting at zero (i.e. a new read), take a note of the id somewhere (e.g. inside the i915 structure). When the next non-zero read comes in, compare the save id with the current coredump's id and return an error if there is a mis-match.

Not sure if this would be viewed as an important enough problem to be worth fixing. I'd be happy with just a FIXME comment for now.

I would also change the test to be 'if (off)' rather than 'if (off < len)'. Technically, the user could have read the first 10 bytes of a coredump and then gets "tate collected\n" as the remainder, for example. If we ensure that 'off' is zero then we know that this is a fresh read from scratch.

John.


        }
return ret;

Reply via email to