On Thu, Nov 6, 2025 at 12:21 PM <[email protected]> wrote: > > From: Vitaly Prosyak <[email protected]> > > Certain multi-GPU configurations (especially GFX12) may hit > data corruption when a DCC-compressed VRAM surface is shared across GPUs > using peer-to-peer (P2P) DMA transfers. > > Such surfaces rely on device-local metadata and cannot be safely accessed > through a remote GPU’s page tables. Attempting to import a DCC-enabled > surface through P2P leads to incorrect rendering or GPU faults. > > This change disables P2P for DCC-enabled VRAM buffers that are contiguous > and allocated on GFX12+ hardware. In these cases, the importer falls back > to the standard system-memory path, avoiding invalid access to compressed > surfaces. > > Future work could consider optional migration (VRAM→System→VRAM) if a > performance regression is observed when `attach->peer2peer = false`. > > Tested on: > - Dual RX 9700 XT (Navi4x) setup > - GNOME and Wayland compositor scenarios > - Confirmed no corruption after disabling P2P under these conditions > > Suggested-by: Christian König <[email protected]> > Signed-off-by: Vitaly Prosyak <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > index 9a0bce3ba24c..d2d31031f672 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c > @@ -260,11 +260,24 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf, > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > > #ifdef HAVE_STRUCT_DMA_BUF_ATTACH_OPS_ALLOW_PEER2PEER
This patch is against the DKMS tree, for upstream, please rebase before you commit. With that fixed: Reviewed-by: Alex Deucher <[email protected]> > + /* > + * Disable peer-to-peer access for DCC-enabled VRAM surfaces on > GFX12+. > + * Such buffers cannot be safely accessed over P2P due to device-local > + * compression metadata. Fallback to system-memory path instead. > + * Device supports GFX12 (GC 12.x or newer) > + * BO was created with the AMDGPU_GEM_CREATE_GFX12_DCC flag > + * > + */ > + if ((adev->ip_versions[GC_HWIP][0] >= IP_VERSION(12, 0, 0)) && > + bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC) { > + attach->peer2peer = false; > + goto update_vm; > + } > if (!amdgpu_dmabuf_is_xgmi_accessible(attach_adev, bo) && > pci_p2pdma_distance(adev->pdev, attach->dev, false) < 0) > attach->peer2peer = false; > #endif > - > +update_vm: > amdgpu_vm_bo_update_shared(bo); > > return 0; > -- > 2.51.2 >
