RE: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
Oh, I see your point, but that actually presents for a cleanup patch, and mine is to add a condition to fix memory leak, I think they different purpose and should be separated, I can add one more patch to cleanup it with that "create_bo_kenel" to make code more tight and clean BR Monk -Original Message- From: Koenig, Christian Sent: 2017年9月18日 19:35 To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Am 18.09.2017 um 12:47 schrieb Liu, Monk: > I didn't get your point... how could bo_create_kernel solve my issue ? It doesn't solve the underlying issue, you just need less code for your workaround. With bo_create_kernel you can do create/pin/kmap in just one function call. > > The thing here is during gpu reset we invoke hw_init for every hw > component, and by design hw_init shouldn't doing anything software > related, thus the BO allocating in hw_init is wrong, Yeah, but your patch doesn't fix that either as far as I can see. > Even switch to bo_create_kernel won't address the issue ... See the implementation of bo_create_kernel(): > if (!*bo_ptr) { > r = amdgpu_bo_create(adev, size, align, true, domain, > } > r = amdgpu_bo_pin(*bo_ptr, domain, gpu_addr); ... > if (cpu_addr) { > r = amdgpu_bo_kmap(*bo_ptr, cpu_addr); ... > } Creating is actually optional, but the function always pins the BO once more and figures out it's CPU address. As far as I can see that should solve your problem for now. Christian. > > > BR Monk > > -Original Message- > From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] > Sent: 2017年9月18日 17:13 > To: Liu, Monk <monk....@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu > reset > > Am 18.09.2017 um 08:11 schrieb Monk Liu: >> doing gpu reset will rerun all hw_init and thus ucode_init_bo is >> invoked again, so we need to skip the fw_buf allocation during sriov >> gpu reset to avoid memory leak. >> >> Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d >> Signed-off-by: Monk Liu <monk@amd.com> >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ >>drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 >> +++ >>2 files changed, 35 insertions(+), 32 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> index 6ff2959..3d0c633 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >> @@ -1185,6 +1185,9 @@ struct amdgpu_firmware { >> >> /* gpu info firmware data pointer */ >> const struct firmware *gpu_info_fw; >> + >> +void *fw_buf_ptr; >> +uint64_t fw_buf_mc; >>}; >> >>/* >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c >> index f306374..6564902 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c >> @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct >> amdgpu_firmware_info *ucode, >>int amdgpu_ucode_init_bo(struct amdgpu_device *adev) >>{ >> struct amdgpu_bo **bo = >firmware.fw_buf; >> -uint64_t fw_mc_addr; >> -void *fw_buf_ptr = NULL; >> uint64_t fw_offset = 0; >> int i, err; >> struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ >> int amdgpu_ucode_init_bo(struct amdgpu_device *adev) >> return 0; >> } >> >> -err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, >> -amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM >> : AMDGPU_GEM_DOMAIN_GTT, >> -AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, >> -NULL, NULL, 0, bo); >> -if (err) { >> -dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", >> err); >> -goto failed; >> -} >> +if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) { > Instead of all this better use amdgpu_bo_create_kernel(), this should already > include most of the handling necessary here. > > Christian. > >> +err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, >> true, >> +amdgpu_sriov_vf(adev) ? >> AMDGPU_GEM_DOMAIN_VRAM : AMD
Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
Am 18.09.2017 um 12:47 schrieb Liu, Monk: I didn't get your point... how could bo_create_kernel solve my issue ? It doesn't solve the underlying issue, you just need less code for your workaround. With bo_create_kernel you can do create/pin/kmap in just one function call. The thing here is during gpu reset we invoke hw_init for every hw component, and by design hw_init shouldn't doing anything software related, thus the BO allocating in hw_init is wrong, Yeah, but your patch doesn't fix that either as far as I can see. Even switch to bo_create_kernel won't address the issue ... See the implementation of bo_create_kernel(): if (!*bo_ptr) { r = amdgpu_bo_create(adev, size, align, true, domain, } r = amdgpu_bo_pin(*bo_ptr, domain, gpu_addr); ... if (cpu_addr) { r = amdgpu_bo_kmap(*bo_ptr, cpu_addr); ... } Creating is actually optional, but the function always pins the BO once more and figures out it's CPU address. As far as I can see that should solve your problem for now. Christian. BR Monk -Original Message- From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] Sent: 2017年9月18日 17:13 To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Am 18.09.2017 um 08:11 schrieb Monk Liu: doing gpu reset will rerun all hw_init and thus ucode_init_bo is invoked again, so we need to skip the fw_buf allocation during sriov gpu reset to avoid memory leak. Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d Signed-off-by: Monk Liu <monk@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++ 2 files changed, 35 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 6ff2959..3d0c633 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1185,6 +1185,9 @@ struct amdgpu_firmware { /* gpu info firmware data pointer */ const struct firmware *gpu_info_fw; + + void *fw_buf_ptr; + uint64_t fw_buf_mc; }; /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c index f306374..6564902 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode, int amdgpu_ucode_init_bo(struct amdgpu_device *adev) { struct amdgpu_bo **bo = >firmware.fw_buf; - uint64_t fw_mc_addr; - void *fw_buf_ptr = NULL; uint64_t fw_offset = 0; int i, err; struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev) return 0; } - err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, - amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, - NULL, NULL, 0, bo); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err); - goto failed; - } + if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) { Instead of all this better use amdgpu_bo_create_kernel(), this should already include most of the handling necessary here. Christian. + err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, + amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, + AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, + NULL, NULL, 0, bo); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err); + goto failed; + } - err = amdgpu_bo_reserve(*bo, false); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err); - goto failed_reserve; - } + err = amdgpu_bo_reserve(*bo, false); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err); + goto failed_reserve; + } - err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, - _mc_addr); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err); - goto failed_pin; - } + err = a
RE: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
I didn't get your point... how could bo_create_kernel solve my issue ? The thing here is during gpu reset we invoke hw_init for every hw component, and by design hw_init shouldn't doing anything software related, thus the BO allocating in hw_init is wrong, Even switch to bo_create_kernel won't address the issue ... BR Monk -Original Message- From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] Sent: 2017年9月18日 17:13 To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Am 18.09.2017 um 08:11 schrieb Monk Liu: > doing gpu reset will rerun all hw_init and thus ucode_init_bo is > invoked again, so we need to skip the fw_buf allocation during sriov > gpu reset to avoid memory leak. > > Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d > Signed-off-by: Monk Liu <monk@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 > +++ > 2 files changed, 35 insertions(+), 32 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 6ff2959..3d0c633 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -1185,6 +1185,9 @@ struct amdgpu_firmware { > > /* gpu info firmware data pointer */ > const struct firmware *gpu_info_fw; > + > + void *fw_buf_ptr; > + uint64_t fw_buf_mc; > }; > > /* > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > index f306374..6564902 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct > amdgpu_firmware_info *ucode, > int amdgpu_ucode_init_bo(struct amdgpu_device *adev) > { > struct amdgpu_bo **bo = >firmware.fw_buf; > - uint64_t fw_mc_addr; > - void *fw_buf_ptr = NULL; > uint64_t fw_offset = 0; > int i, err; > struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ > int amdgpu_ucode_init_bo(struct amdgpu_device *adev) > return 0; > } > > - err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, > - amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM > : AMDGPU_GEM_DOMAIN_GTT, > - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, > - NULL, NULL, 0, bo); > - if (err) { > - dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", > err); > - goto failed; > - } > + if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) { Instead of all this better use amdgpu_bo_create_kernel(), this should already include most of the handling necessary here. Christian. > + err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, > true, > + amdgpu_sriov_vf(adev) ? > AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, > + AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, > + NULL, NULL, 0, bo); > + if (err) { > + dev_err(adev->dev, "(%d) Firmware buffer allocate > failed\n", err); > + goto failed; > + } > > - err = amdgpu_bo_reserve(*bo, false); > - if (err) { > - dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", > err); > - goto failed_reserve; > - } > + err = amdgpu_bo_reserve(*bo, false); > + if (err) { > + dev_err(adev->dev, "(%d) Firmware buffer reserve > failed\n", err); > + goto failed_reserve; > + } > > - err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM > : AMDGPU_GEM_DOMAIN_GTT, > - _mc_addr); > - if (err) { > - dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err); > - goto failed_pin; > - } > + err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? > AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, > + >firmware.fw_buf_mc); > + if (err) { > + dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", > err); > + goto failed_pin; > + } > > - err = amdgpu_bo_kmap(*bo, _buf_ptr); > - if (err
Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
Am 18.09.2017 um 08:11 schrieb Monk Liu: doing gpu reset will rerun all hw_init and thus ucode_init_bo is invoked again, so we need to skip the fw_buf allocation during sriov gpu reset to avoid memory leak. Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d Signed-off-by: Monk Liu--- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++ 2 files changed, 35 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 6ff2959..3d0c633 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1185,6 +1185,9 @@ struct amdgpu_firmware { /* gpu info firmware data pointer */ const struct firmware *gpu_info_fw; + + void *fw_buf_ptr; + uint64_t fw_buf_mc; }; /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c index f306374..6564902 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode, int amdgpu_ucode_init_bo(struct amdgpu_device *adev) { struct amdgpu_bo **bo = >firmware.fw_buf; - uint64_t fw_mc_addr; - void *fw_buf_ptr = NULL; uint64_t fw_offset = 0; int i, err; struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev) return 0; } - err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, - amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, - NULL, NULL, 0, bo); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err); - goto failed; - } + if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) { Instead of all this better use amdgpu_bo_create_kernel(), this should already include most of the handling necessary here. Christian. + err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, + amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, + AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, + NULL, NULL, 0, bo); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err); + goto failed; + } - err = amdgpu_bo_reserve(*bo, false); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err); - goto failed_reserve; - } + err = amdgpu_bo_reserve(*bo, false); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err); + goto failed_reserve; + } - err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, - _mc_addr); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err); - goto failed_pin; - } + err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, + >firmware.fw_buf_mc); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err); + goto failed_pin; + } - err = amdgpu_bo_kmap(*bo, _buf_ptr); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err); - goto failed_kmap; - } + err = amdgpu_bo_kmap(*bo, >firmware.fw_buf_ptr); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err); + goto failed_kmap; + } - amdgpu_bo_unreserve(*bo); + amdgpu_bo_unreserve(*bo); + } - memset(fw_buf_ptr, 0, adev->firmware.fw_size); + memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size); /* * if SMU loaded firmware, it needn't add SMC, UVD, and VCE @@ -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev) ucode = >firmware.ucode[i]; if (ucode->fw) { header = (const struct common_firmware_header *)ucode->fw->data; - amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset, - (void *)((uint8_t *)fw_buf_ptr + fw_offset)); +