[AMD Public Use] If this causes an issue, any access to vram via the BAR could cause an issue.
Alex ________________________________ From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Russell, Kent <kent.russ...@amd.com> Sent: Tuesday, April 14, 2020 10:19 AM To: Koenig, Christian <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> Cc: Kuehling, Felix <felix.kuehl...@amd.com>; Kim, Jonathan <jonathan....@amd.com> Subject: RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2" [AMD Official Use Only - Internal Distribution Only] On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separate thread when the issue was first reported, so they can hopefully provide some additional information. Kent > -----Original Message----- > From: Christian König <ckoenig.leichtzumer...@gmail.com> > Sent: Tuesday, April 14, 2020 9:52 AM > To: Russell, Kent <kent.russ...@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in > amdgpu_device_vram_access v2" > > Am 13.04.20 um 20:20 schrieb Kent Russell: > > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. > > The original patch causes a RAS event and subsequent kernel hard-hang > > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and > > Arcturus > > > > dmesg output at hang time: > > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! > > amdgpu 0000:67:00.0: GPU reset begin! > > Evicting PASID 0x8000 queues > > Started evicting pasid 0x8000 > > qcm fence wait loop timeout expired > > The cp might be in an unrecoverable state due to an unsuccessful > > queues preemption Failed to evict process queues Failed to suspend > > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid > > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost > > due to RAS ERREVENT_ATHUB_INTERRUPT > > amdgpu: [powerplay] Failed to send message 0x26, response 0x0 > > amdgpu: [powerplay] Failed to set soft min gfxclk ! > > amdgpu: [powerplay] Failed to upload DPM Bootup Levels! > > amdgpu: [powerplay] Failed to send message 0x7, response 0x0 > > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu > features! > > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features! > > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! > > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP > > block <powerplay> failed -5 > > Do you have more information on what's going wrong here since this is a really > important patch for KFD debugging. > > > > > Signed-off-by: Kent Russell <kent.russ...@amd.com> > > Reviewed-by: Christian König <christian.koe...@amd.com> > > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ---------------------- > > 1 file changed, 26 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index cf5d6e585634..a3f997f84020 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct > amdgpu_device *adev, loff_t pos, > > uint32_t hi = ~0; > > uint64_t last; > > > > - > > -#ifdef CONFIG_64BIT > > - last = min(pos + size, adev->gmc.visible_vram_size); > > - if (last > pos) { > > - void __iomem *addr = adev->mman.aper_base_kaddr + pos; > > - size_t count = last - pos; > > - > > - if (write) { > > - memcpy_toio(addr, buf, count); > > - mb(); > > - amdgpu_asic_flush_hdp(adev, NULL); > > - } else { > > - amdgpu_asic_invalidate_hdp(adev, NULL); > > - mb(); > > - memcpy_fromio(buf, addr, count); > > - } > > - > > - if (count == size) > > - return; > > - > > - pos += count; > > - buf += count / 4; > > - size -= count; > > - } > > -#endif > > - > > spin_lock_irqsave(&adev->mmio_idx_lock, flags); > > for (last = pos + size; pos < last; pos += 4) { > > uint32_t tmp = pos >> 31; _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx