It's based on v5.9-rc2 but won't apply cleanly since there is a significant amount of amd-staging-drm-next patches which this was applied on top of.
Andrey ________________________________ From: Bjorn Helgaas <helg...@kernel.org> Sent: 02 September 2020 17:36 To: Grodzovsky, Andrey <andrey.grodzov...@amd.com> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; sathyanarayanan.kuppusw...@linux.intel.com <sathyanarayanan.kuppusw...@linux.intel.com>; linux-...@vger.kernel.org <linux-...@vger.kernel.org>; Deucher, Alexander <alexander.deuc...@amd.com>; Das, Nirmoy <nirmoy....@amd.com>; Li, Dennis <dennis...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; Tuikov, Luben <luben.tui...@amd.com>; bhelg...@google.com <bhelg...@google.com> Subject: Re: [PATCH v4 0/8] Implement PCI Error Recovery on Navi12 On Wed, Sep 02, 2020 at 02:42:02PM -0400, Andrey Grodzovsky wrote: > Many PCI bus controllers are able to detect a variety of hardware PCI errors > on the bus, > such as parity errors on the data and address buses, A typical action taken > is to disconnect > the affected device, halting all I/O to it. Typically, a reconnection > mechanism is also offered, > so that the affected PCI device(s) are reset and put back into working > condition. > In our case the reconnection mechanism is facilitated by kernel Downstream > Port Containment (DPC) > driver which will intercept the PCIe error, remove (isolate) the faulting > device after which it > will call into PCIe recovery code of the PCI core. > This code will call hooks which are implemented in this patchset where the > error is > first reported at which point we block the GPU scheduler, next DPC resets the > PCI link which generates HW interrupt which is intercepted by SMU/PSP who > start executing mode1 reset of the ASIC, next step is slot reset hook is > called > at which point we wait for ASIC reset to complete, restore PCI config space > and run > HW suspend/resume sequence to resinit the ASIC. > Last hook called is resume normal operation at which point we will restart > the GPU scheduler. > > More info on PCIe error handling and DPC are here: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2FPCI%2Fpci-error-recovery.html&data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cc1ab3b293aa543a591a808d84f884058%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637346793904985104&sdata=FgfyOmKy7iVq5N6Z7h1c9rrkJReSzOlI%2BbykOE0rfac%3D&reserved=0 > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fpatch%2F8945681%2F&data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cc1ab3b293aa543a591a808d84f884058%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637346793904985104&sdata=rSXEB8NoAD9%2BRfRddEvOGfwBJJ80KBnLgI%2B%2BPGsFdOc%3D&reserved=0 > > v4:Rebase to 5.9 kernel and revert PCI error recovery core commit which > breaks the feature. What does this apply to? I tried - v5.9-rc1 (9123e3a74ec7 ("Linux 5.9-rc1")), - v5.9-rc2 (d012a7190fc1 ("Linux 5.9-rc2")), - v5.9-rc3 (f75aef392f86 ("Linux 5.9-rc3")), - drm-next (3393649977f9 ("Merge tag 'drm-intel-next-2020-08-24-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")), - linux-next (4442749a2031 ("Add linux-next specific files for 20200902")) but it doesn't apply cleanly to any. > Andrey Grodzovsky (8): > drm/amdgpu: Avoid accessing HW when suspending SW state > drm/amdgpu: Block all job scheduling activity during DPC recovery > drm/amdgpu: Fix SMU error failure > drm/amdgpu: Fix consecutive DPC recovery failures. > drm/amdgpu: Trim amdgpu_pci_slot_reset by reusing code. > drm/amdgpu: Disable DPC for XGMI for now. > drm/amdgpu: Minor checkpatch fix > Revert "PCI/ERR: Update error status after reset_link()" > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 + > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 247 > +++++++++++++++++++++-------- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 6 + > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 + > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 18 ++- > drivers/gpu/drm/amd/amdgpu/nv.c | 4 +- > drivers/gpu/drm/amd/amdgpu/soc15.c | 4 +- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 3 + > drivers/pci/pcie/err.c | 3 +- > 10 files changed, 222 insertions(+), 79 deletions(-) > > -- > 2.7.4 >
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx