Re: [PATCH] drm/amdkfd: dqm fence memory corruption
Am 2021-03-26 um 5:38 a.m. schrieb Qu Huang: > On 2021/1/28 5:50, Felix Kuehling wrote: >> Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang: >>> Amdgpu driver uses 4-byte data type as DQM fence memory, >>> and transmits GPU address of fence memory to microcode >>> through query status PM4 message. However, query status >>> PM4 message definition and microcode processing are all >>> processed according to 8 bytes. Fence memory only allocates >>> 4 bytes of memory, but microcode does write 8 bytes of memory, >>> so there is a memory corruption. >> >> Thank you for pointing out that discrepancy. That's a good catch! >> >> I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer. >> We should probably also fix up the query_status and >> amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence >> values everywhere to be consistent. >> >> Regards, >> Felix > Hi Felix, Thanks for your advice, please check v2 at > https://lore.kernel.org/patchwork/patch/1372584/ Thank you for the reminder. I somehow missed your v2 patch on the mailing list. I have reviewed and applied it to amd-staging-drm-next now. Regards, Felix > Thanks, > Qu. >> >> >>> >>> Signed-off-by: Qu Huang >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> index e686ce2..8b38d0c 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c >>> @@ -1161,7 +1161,7 @@ static int start_cpsch(struct >>> device_queue_manager *dqm) >>> pr_debug("Allocating fence memory\n"); >>> /* allocate fence memory on the gart */ >>> - retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr), >>> + retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t), >>> &dqm->fence_mem); >>> if (retval) >
Re: [PATCH] drm/amdkfd: dqm fence memory corruption
On 2021/1/28 5:50, Felix Kuehling wrote: Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang: Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Thank you for pointing out that discrepancy. That's a good catch! I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer. We should probably also fix up the query_status and amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence values everywhere to be consistent. Regards, Felix Hi Felix, Thanks for your advice, please check v2 at https://lore.kernel.org/patchwork/patch/1372584/ Thanks, Qu. Signed-off-by: Qu Huang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index e686ce2..8b38d0c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm) pr_debug("Allocating fence memory\n"); /* allocate fence memory on the gart */ - retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr), + retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t), &dqm->fence_mem); if (retval)
Re: [PATCH] drm/amdkfd: dqm fence memory corruption
Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang: > Amdgpu driver uses 4-byte data type as DQM fence memory, > and transmits GPU address of fence memory to microcode > through query status PM4 message. However, query status > PM4 message definition and microcode processing are all > processed according to 8 bytes. Fence memory only allocates > 4 bytes of memory, but microcode does write 8 bytes of memory, > so there is a memory corruption. Thank you for pointing out that discrepancy. That's a good catch! I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer. We should probably also fix up the query_status and amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence values everywhere to be consistent. Regards, Felix > > Signed-off-by: Qu Huang > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index e686ce2..8b38d0c 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm) > pr_debug("Allocating fence memory\n"); > > /* allocate fence memory on the gart */ > - retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr), > + retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t), > &dqm->fence_mem); > > if (retval)
[PATCH] drm/amdkfd: dqm fence memory corruption
Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Signed-off-by: Qu Huang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index e686ce2..8b38d0c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm) pr_debug("Allocating fence memory\n"); /* allocate fence memory on the gart */ - retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr), + retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t), &dqm->fence_mem); if (retval) -- 1.8.3.1