Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
That was a good hint. glmark2 sees a really nice 5% improvement with this change. Christian. Am 30.08.2017 um 02:27 schrieb Marek Olšák: It might be interesting to try glmark2. Marek On Tue, Aug 29, 2017 at 3:59 PM, Christian König wrote: Ok, found something that works. Xonotic in lowest resolution, lowest effects quality (e.g. totally CPU bound): Without per process BOs: Xonotic 0.8: pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] Test 1 of 1 Estimated Trial Run Count:3 Estimated Time To Completion: 3 Minutes Started Run 1 @ 21:13:50 Started Run 2 @ 21:14:57 Started Run 3 @ 21:16:03 [Std. Dev: 0.94%] Test Results: 187.436577 189.514724 190.9605812 Average: 189.30 Frames Per Second Minimum: 131 Maximum: 355 With per process BOs: Xonotic 0.8: pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] Test 1 of 1 Estimated Trial Run Count:3 Estimated Time To Completion: 3 Minutes Started Run 1 @ 21:20:05 Started Run 2 @ 21:21:07 Started Run 3 @ 21:22:10 [Std. Dev: 1.49%] Test Results: 203.0471676 199.6622532 197.0954183 Average: 199.93 Frames Per Second Minimum: 132 Maximum: 349 Well that looks like some improvement. Regards, Christian. Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing): I will push our vulkan guys to test it, their bo list is very long. 发自坚果 Pro Christian K鰊ig 于 2017年8月28日 下午7:55写道: Am 28.08.2017 um 06:21 schrieb zhoucm1: On 2017年08月27日 18:03, Christian König wrote: Am 25.08.2017 um 21:19 schrieb Christian König: Am 25.08.2017 um 18:22 schrieb Marek Olšák: On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: Am 25.08.2017 um 12:32 schrieb zhoucm1: On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, indeed save a lot of bo list. Don't cheer to loud yet, that is a completely constructed test case. So far I wasn't able to archive any improvements with any real game on this with Mesa. With thinking more, too many BOs share one reservation, which could result in reservation lock often is busy, if eviction or destroy also happens often in the meaning time, then which could effect VM update and CS submission as well. That's exactly the reason why I've added code to the BO destroy path to avoid at least some of the problems. But yeah, that's only the tip of the iceberg of problems with that approach. Anyway, this is very good start and try that we reduce CS overhead, especially we've seen "reduces average CS overhead for 10K BOs from 21ms down to 48us. ". Actually, it's not that good. See this is a completely build up test case on a kernel with lockdep and KASAN enabled. In reality we usually don't have so many BOs and so far I wasn't able to find much of an improvement in any real world testing. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
It might be interesting to try glmark2. Marek On Tue, Aug 29, 2017 at 3:59 PM, Christian König wrote: > Ok, found something that works. Xonotic in lowest resolution, lowest effects > quality (e.g. totally CPU bound): > > Without per process BOs: > > Xonotic 0.8: > pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] > Test 1 of 1 > Estimated Trial Run Count:3 > Estimated Time To Completion: 3 Minutes > Started Run 1 @ 21:13:50 > Started Run 2 @ 21:14:57 > Started Run 3 @ 21:16:03 [Std. Dev: 0.94%] > > Test Results: > 187.436577 > 189.514724 > 190.9605812 > > Average: 189.30 Frames Per Second > Minimum: 131 > Maximum: 355 > > With per process BOs: > > Xonotic 0.8: > pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] > Test 1 of 1 > Estimated Trial Run Count:3 > Estimated Time To Completion: 3 Minutes > Started Run 1 @ 21:20:05 > Started Run 2 @ 21:21:07 > Started Run 3 @ 21:22:10 [Std. Dev: 1.49%] > > Test Results: > 203.0471676 > 199.6622532 > 197.0954183 > > Average: 199.93 Frames Per Second > Minimum: 132 > Maximum: 349 > > Well that looks like some improvement. > > Regards, > Christian. > > > Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing): > > I will push our vulkan guys to test it, their bo list is very long. > > 发自坚果 Pro > > Christian K鰊ig 于 2017年8月28日 下午7:55写道: > > Am 28.08.2017 um 06:21 schrieb zhoucm1: >> >> >> On 2017年08月27日 18:03, Christian König wrote: >>> Am 25.08.2017 um 21:19 schrieb Christian König: Am 25.08.2017 um 18:22 schrieb Marek Olšák: > On Fri, Aug 25, 2017 at 3:00 PM, Christian König > wrote: >> Am 25.08.2017 um 12:32 schrieb zhoucm1: >>> >>> >>> On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. >>> Wow, cheers, eventually you get per vm bo to same reservation >>> with PD/pts, >>> indeed save a lot of bo list. >> >> Don't cheer to loud yet, that is a completely constructed test case. >> >> So far I wasn't able to archive any improvements with any real >> game on this >> with Mesa. >> With thinking more, too many BOs share one reservation, which could >> result in reservation lock often is busy, if eviction or destroy also >> happens often in the meaning time, then which could effect VM update >> and CS submission as well. > > That's exactly the reason why I've added code to the BO destroy path to > avoid at least some of the problems. But yeah, that's only the tip of > the iceberg of problems with that approach. > >> Anyway, this is very good start and try that we reduce CS overhead, >> especially we've seen "reduces average CS overhead for 10K BOs from >> 21ms down to 48us. ". > > Actually, it's not that good. See this is a completely build up test > case on a kernel with lockdep and KASAN enabled. > > In reality we usually don't have so many BOs and so far I wasn't able to > find much of an improvement in any real world testing. > > Regards, > Christian. > > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
Ok, found something that works. Xonotic in lowest resolution, lowest effects quality (e.g. totally CPU bound): Without per process BOs: Xonotic 0.8: pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] Test 1 of 1 Estimated Trial Run Count:3 Estimated Time To Completion: 3 Minutes Started Run 1 @ 21:13:50 Started Run 2 @ 21:14:57 Started Run 3 @ 21:16:03 [Std. Dev: 0.94%] Test Results: 187.436577 189.514724 190.9605812 Average: 189.30 Frames Per Second Minimum: 131 Maximum: 355 With per process BOs: Xonotic 0.8: pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low] Test 1 of 1 Estimated Trial Run Count:3 Estimated Time To Completion: 3 Minutes Started Run 1 @ 21:20:05 Started Run 2 @ 21:21:07 Started Run 3 @ 21:22:10 [Std. Dev: 1.49%] Test Results: 203.0471676 199.6622532 197.0954183 Average: 199.93 Frames Per Second Minimum: 132 Maximum: 349 Well that looks like some improvement. Regards, Christian. Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing): I will push our vulkan guys to test it, their bo list is very long. 发自坚果 Pro Christian K鰊ig 于 2017年8月28日 下午7:55写道: Am 28.08.2017 um 06:21 schrieb zhoucm1: > > > On 2017年08月27日 18:03, Christian König wrote: >> Am 25.08.2017 um 21:19 schrieb Christian König: >>> Am 25.08.2017 um 18:22 schrieb Marek Olšák: On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: > Am 25.08.2017 um 12:32 schrieb zhoucm1: >> >> >> On 2017年08月25日 17:38, Christian König wrote: >>> From: Christian König >>> >>> Add the IOCTL interface so that applications can allocate per VM >>> BOs. >>> >>> Still WIP since not all corner cases are tested yet, but this >>> reduces >>> average >>> CS overhead for 10K BOs from 21ms down to 48us. >> Wow, cheers, eventually you get per vm bo to same reservation >> with PD/pts, >> indeed save a lot of bo list. > > Don't cheer to loud yet, that is a completely constructed test case. > > So far I wasn't able to archive any improvements with any real > game on this > with Mesa. > With thinking more, too many BOs share one reservation, which could > result in reservation lock often is busy, if eviction or destroy also > happens often in the meaning time, then which could effect VM update > and CS submission as well. That's exactly the reason why I've added code to the BO destroy path to avoid at least some of the problems. But yeah, that's only the tip of the iceberg of problems with that approach. > Anyway, this is very good start and try that we reduce CS overhead, > especially we've seen "reduces average CS overhead for 10K BOs from > 21ms down to 48us. ". Actually, it's not that good. See this is a completely build up test case on a kernel with lockdep and KASAN enabled. In reality we usually don't have so many BOs and so far I wasn't able to find much of an improvement in any real world testing. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
I will push our vulkan guys to test it, their bo list is very long. 发自坚果 Pro Christian K鰊ig 于 2017年8月28日 下午7:55写道: Am 28.08.2017 um 06:21 schrieb zhoucm1: > > > On 2017年08月27日 18:03, Christian König wrote: >> Am 25.08.2017 um 21:19 schrieb Christian König: >>> Am 25.08.2017 um 18:22 schrieb Marek Olšák: On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: > Am 25.08.2017 um 12:32 schrieb zhoucm1: >> >> >> On 2017年08月25日 17:38, Christian König wrote: >>> From: Christian König >>> >>> Add the IOCTL interface so that applications can allocate per VM >>> BOs. >>> >>> Still WIP since not all corner cases are tested yet, but this >>> reduces >>> average >>> CS overhead for 10K BOs from 21ms down to 48us. >> Wow, cheers, eventually you get per vm bo to same reservation >> with PD/pts, >> indeed save a lot of bo list. > > Don't cheer to loud yet, that is a completely constructed test case. > > So far I wasn't able to archive any improvements with any real > game on this > with Mesa. > With thinking more, too many BOs share one reservation, which could > result in reservation lock often is busy, if eviction or destroy also > happens often in the meaning time, then which could effect VM update > and CS submission as well. That's exactly the reason why I've added code to the BO destroy path to avoid at least some of the problems. But yeah, that's only the tip of the iceberg of problems with that approach. > Anyway, this is very good start and try that we reduce CS overhead, > especially we've seen "reduces average CS overhead for 10K BOs from > 21ms down to 48us. ". Actually, it's not that good. See this is a completely build up test case on a kernel with lockdep and KASAN enabled. In reality we usually don't have so many BOs and so far I wasn't able to find much of an improvement in any real world testing. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
Am 28.08.2017 um 06:21 schrieb zhoucm1: On 2017年08月27日 18:03, Christian König wrote: Am 25.08.2017 um 21:19 schrieb Christian König: Am 25.08.2017 um 18:22 schrieb Marek Olšák: On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: Am 25.08.2017 um 12:32 schrieb zhoucm1: On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, indeed save a lot of bo list. Don't cheer to loud yet, that is a completely constructed test case. So far I wasn't able to archive any improvements with any real game on this with Mesa. With thinking more, too many BOs share one reservation, which could result in reservation lock often is busy, if eviction or destroy also happens often in the meaning time, then which could effect VM update and CS submission as well. That's exactly the reason why I've added code to the BO destroy path to avoid at least some of the problems. But yeah, that's only the tip of the iceberg of problems with that approach. Anyway, this is very good start and try that we reduce CS overhead, especially we've seen "reduces average CS overhead for 10K BOs from 21ms down to 48us. ". Actually, it's not that good. See this is a completely build up test case on a kernel with lockdep and KASAN enabled. In reality we usually don't have so many BOs and so far I wasn't able to find much of an improvement in any real world testing. Regards, Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
On 2017年08月27日 18:03, Christian König wrote: Am 25.08.2017 um 21:19 schrieb Christian König: Am 25.08.2017 um 18:22 schrieb Marek Olšák: On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: Am 25.08.2017 um 12:32 schrieb zhoucm1: On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, indeed save a lot of bo list. Don't cheer to loud yet, that is a completely constructed test case. So far I wasn't able to archive any improvements with any real game on this with Mesa. With thinking more, too many BOs share one reservation, which could result in reservation lock often is busy, if eviction or destroy also happens often in the meaning time, then which could effect VM update and CS submission as well. Anyway, this is very good start and try that we reduce CS overhead, especially we've seen "reduces average CS overhead for 10K BOs from 21ms down to 48us. ". Regards, David Zhou BTW: Marek can you take a look with some CPU bound tests? I can provide a kernel branch if necessary. Do you have a branch that works on Raven? This patch series doesn't, and I didn't investigate why. I will come up with one on Monday. Branch vm_improvements on git://people.freedesktop.org/~deathsimple/linux together with the attached patch should work. Only testing on Tonga, but that's based on amd-staging-4.12 and so should work on Raven as well. If not I still have a bug somewhere which needs to be fixed. Thanks, Christian. Have a nice weekend guys, Christian. Marek Regards, Christian. overall looks good, I will take a detailed check for this tomorrow. Regards, David Zhou Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- include/uapi/drm/amdgpu_drm.h | 2 ++ 5 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b1e817c..21cab36 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { */ void amdgpu_gem_force_release(struct amdgpu_device *adev); int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj); + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj); int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index 0e907ea..7256f83 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct amdgpu_fbdev *rfbdev, AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED, - true, &gobj); + true, NULL, &gobj); if (ret) { pr_err("failed to allocate framebuffer (%d)\n", aligned_size); return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d028806..b8e8d67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj) } int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj) + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj) { -struct amdgpu_bo *robj; +struct amdgpu_bo *bo; int r; *obj = NULL; @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, retry: r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain, - flags, NULL, NULL, 0, &robj); + flags, NULL, resv, 0, &bo); if (r) { if (r != -ERESTARTSYS) { if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { @@ -71,7 +72,7 @@ int amdgpu_gem_object_
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
Am 25.08.2017 um 21:19 schrieb Christian König: Am 25.08.2017 um 18:22 schrieb Marek Olšák: On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: Am 25.08.2017 um 12:32 schrieb zhoucm1: On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, indeed save a lot of bo list. Don't cheer to loud yet, that is a completely constructed test case. So far I wasn't able to archive any improvements with any real game on this with Mesa. BTW: Marek can you take a look with some CPU bound tests? I can provide a kernel branch if necessary. Do you have a branch that works on Raven? This patch series doesn't, and I didn't investigate why. I will come up with one on Monday. Branch vm_improvements on git://people.freedesktop.org/~deathsimple/linux together with the attached patch should work. Only testing on Tonga, but that's based on amd-staging-4.12 and so should work on Raven as well. If not I still have a bug somewhere which needs to be fixed. Thanks, Christian. Have a nice weekend guys, Christian. Marek Regards, Christian. overall looks good, I will take a detailed check for this tomorrow. Regards, David Zhou Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- include/uapi/drm/amdgpu_drm.h | 2 ++ 5 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b1e817c..21cab36 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { */ void amdgpu_gem_force_release(struct amdgpu_device *adev); int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj); + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj); int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index 0e907ea..7256f83 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct amdgpu_fbdev *rfbdev, AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED, - true, &gobj); + true, NULL, &gobj); if (ret) { pr_err("failed to allocate framebuffer (%d)\n", aligned_size); return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d028806..b8e8d67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj) } int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj) + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj) { -struct amdgpu_bo *robj; +struct amdgpu_bo *bo; int r; *obj = NULL; @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, retry: r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain, - flags, NULL, NULL, 0, &robj); + flags, NULL, resv, 0, &bo); if (r) { if (r != -ERESTARTSYS) { if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, } return r; } -*obj = &robj->gem_base; +*obj = &bo->gem_base; return 0; } @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, struct amdgpu_vm *vm = &fpriv->vm; struct amdgpu_bo_list_entry vm_pd; -struct list_head list; +struct list_head list, duplicates; struct ttm_validate_buffer tv;
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
Am 25.08.2017 um 23:31 schrieb Felix Kuehling: That's clever. I was scratching my head where the BOs were getting validated, just by sharing the VM reservation object. Until I carefully read your previous commit. Yeah, didn't had time to properly comment on your last mail. In general, I find the VM code extremely frustrating and confusing to review. Too many lists of different things, and it's really hard to keep track of which list track what type of object, in which situation. For example, take struct amdgpu_vm: /* BOs who needs a validation */ struct list_headevicted; /* BOs moved, but not yet updated in the PT */ struct list_headmoved; /* BO mappings freed, but not yet updated in the PT */ struct list_headfreed; Three lists of BOs (according to the comments). But evicted and moved are lists of amdgpu_vm_bo_base, freed is a list of amdgpu_bo_va_mapping. moved and freed are used for tracking BOs mapped in the VM. I think moved may also track page table BOs, but I'm not sure. evicted is used only for tracking page table BOs. In patch #7 you add relocated to the mix. Now it gets really funny. What's the difference between relocated and evicted and moved? Essentially nothing. I actually tried to merge them, but then realized that this would probably remove the ability to only update the directories and so most likely your KFD usage of that. It seems PT BOs can be on any of these lists. I think evicted means the BO needs to be validated. moved or relocated means it's been validated but its mappings must be updated. For PT BOs and mapped BOs that means different things, so it makes sense to have different lists for them. But I think PT BOs can also end up on the moved list when amdgpu_vm_bo_invalidate is called for a page table BO (through amdgpu_bo_move_notify). So I'm still confused. amdgpu_vm_bo_invalidate() has logic to always put PDs and PTs on the relocated list and everything else on the moved list. So PDs/PTs should never end up on the moved list. I think this could be clarified with more descriptive names for the lists. If PT BOs and mapped BOs must be tracked separately that should be clear from the names. That is a good idea, but in the long term I want to merge reloacted and moved list and then decide while walking the list what to do, but that is the next step I think. Regards, Christian. Regards, Felix On 2017-08-25 05:38 AM, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- include/uapi/drm/amdgpu_drm.h | 2 ++ 5 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b1e817c..21cab36 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { */ void amdgpu_gem_force_release(struct amdgpu_device *adev); int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, - int alignment, u32 initial_domain, - u64 flags, bool kernel, - struct drm_gem_object **obj); +int alignment, u32 initial_domain, +u64 flags, bool kernel, +struct reservation_object *resv, +struct drm_gem_object **obj); int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index 0e907ea..7256f83 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct amdgpu_fbdev *rfbdev, AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED, - true, &gobj); + true, NULL, &gobj); if (ret) { pr_err("failed to allocate framebuffer (%d)\n", aligned_size); return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d028806..b8e8d67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/d
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
That's clever. I was scratching my head where the BOs were getting validated, just by sharing the VM reservation object. Until I carefully read your previous commit. In general, I find the VM code extremely frustrating and confusing to review. Too many lists of different things, and it's really hard to keep track of which list track what type of object, in which situation. For example, take struct amdgpu_vm: /* BOs who needs a validation */ struct list_headevicted; /* BOs moved, but not yet updated in the PT */ struct list_headmoved; /* BO mappings freed, but not yet updated in the PT */ struct list_headfreed; Three lists of BOs (according to the comments). But evicted and moved are lists of amdgpu_vm_bo_base, freed is a list of amdgpu_bo_va_mapping. moved and freed are used for tracking BOs mapped in the VM. I think moved may also track page table BOs, but I'm not sure. evicted is used only for tracking page table BOs. In patch #7 you add relocated to the mix. Now it gets really funny. What's the difference between relocated and evicted and moved? It seems PT BOs can be on any of these lists. I think evicted means the BO needs to be validated. moved or relocated means it's been validated but its mappings must be updated. For PT BOs and mapped BOs that means different things, so it makes sense to have different lists for them. But I think PT BOs can also end up on the moved list when amdgpu_vm_bo_invalidate is called for a page table BO (through amdgpu_bo_move_notify). So I'm still confused. I think this could be clarified with more descriptive names for the lists. If PT BOs and mapped BOs must be tracked separately that should be clear from the names. Regards, Felix On 2017-08-25 05:38 AM, Christian König wrote: > From: Christian König > > Add the IOCTL interface so that applications can allocate per VM BOs. > > Still WIP since not all corner cases are tested yet, but this reduces average > CS overhead for 10K BOs from 21ms down to 48us. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 > ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- > include/uapi/drm/amdgpu_drm.h | 2 ++ > 5 files changed, 51 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index b1e817c..21cab36 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { > */ > void amdgpu_gem_force_release(struct amdgpu_device *adev); > int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, > - int alignment, u32 initial_domain, > - u64 flags, bool kernel, > - struct drm_gem_object **obj); > + int alignment, u32 initial_domain, > + u64 flags, bool kernel, > + struct reservation_object *resv, > + struct drm_gem_object **obj); > > int amdgpu_mode_dumb_create(struct drm_file *file_priv, > struct drm_device *dev, > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > index 0e907ea..7256f83 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c > @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct > amdgpu_fbdev *rfbdev, > AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | > AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | > AMDGPU_GEM_CREATE_VRAM_CLEARED, > -true, &gobj); > +true, NULL, &gobj); > if (ret) { > pr_err("failed to allocate framebuffer (%d)\n", aligned_size); > return -ENOMEM; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > index d028806..b8e8d67 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj) > } > > int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, > - int alignment, u32 initial_domain, > - u64 flags, bool kernel, > - struct drm_gem_object **obj) > + int alignment, u32 initial_domain, > + u64 flags, bool kernel, > + struct reservation_object *resv, > + struct drm_gem_object *
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
Am 25.08.2017 um 18:22 schrieb Marek Olšák: On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: Am 25.08.2017 um 12:32 schrieb zhoucm1: On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, indeed save a lot of bo list. Don't cheer to loud yet, that is a completely constructed test case. So far I wasn't able to archive any improvements with any real game on this with Mesa. BTW: Marek can you take a look with some CPU bound tests? I can provide a kernel branch if necessary. Do you have a branch that works on Raven? This patch series doesn't, and I didn't investigate why. I will come up with one on Monday. Have a nice weekend guys, Christian. Marek Regards, Christian. overall looks good, I will take a detailed check for this tomorrow. Regards, David Zhou Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- include/uapi/drm/amdgpu_drm.h | 2 ++ 5 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b1e817c..21cab36 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { */ void amdgpu_gem_force_release(struct amdgpu_device *adev); int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj); + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj); int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index 0e907ea..7256f83 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct amdgpu_fbdev *rfbdev, AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED, - true, &gobj); + true, NULL, &gobj); if (ret) { pr_err("failed to allocate framebuffer (%d)\n", aligned_size); return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d028806..b8e8d67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj) } int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj) + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj) { -struct amdgpu_bo *robj; +struct amdgpu_bo *bo; int r; *obj = NULL; @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, retry: r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain, - flags, NULL, NULL, 0, &robj); + flags, NULL, resv, 0, &bo); if (r) { if (r != -ERESTARTSYS) { if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, } return r; } -*obj = &robj->gem_base; +*obj = &bo->gem_base; return 0; } @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, struct amdgpu_vm *vm = &fpriv->vm; struct amdgpu_bo_list_entry vm_pd; -struct list_head list; +struct list_head list, duplicates; struct ttm_validate_buffer tv; struct ww_acquire_ctx ticket; struct amdgpu_bo_va *bo_va; int r; INIT_LIST_HEAD(&list); +INIT_LIST_HEAD(&duplicates); tv.bo = &bo->tbo; tv.shared = true; @@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, amdgpu_vm_get_pd_bo(vm, &list
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
On Fri, Aug 25, 2017 at 3:00 PM, Christian König wrote: > Am 25.08.2017 um 12:32 schrieb zhoucm1: >> >> >> >> On 2017年08月25日 17:38, Christian König wrote: >>> >>> From: Christian König >>> >>> Add the IOCTL interface so that applications can allocate per VM BOs. >>> >>> Still WIP since not all corner cases are tested yet, but this reduces >>> average >>> CS overhead for 10K BOs from 21ms down to 48us. >> >> Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, >> indeed save a lot of bo list. > > > Don't cheer to loud yet, that is a completely constructed test case. > > So far I wasn't able to archive any improvements with any real game on this > with Mesa. > > BTW: Marek can you take a look with some CPU bound tests? I can provide a > kernel branch if necessary. Do you have a branch that works on Raven? This patch series doesn't, and I didn't investigate why. Marek > > Regards, > Christian. > > >> overall looks good, I will take a detailed check for this tomorrow. >> >> Regards, >> David Zhou >>> >>> >>> Signed-off-by: Christian König >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 >>> ++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- >>> include/uapi/drm/amdgpu_drm.h | 2 ++ >>> 5 files changed, 51 insertions(+), 22 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> index b1e817c..21cab36 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { >>>*/ >>> void amdgpu_gem_force_release(struct amdgpu_device *adev); >>> int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long >>> size, >>> -int alignment, u32 initial_domain, >>> -u64 flags, bool kernel, >>> -struct drm_gem_object **obj); >>> + int alignment, u32 initial_domain, >>> + u64 flags, bool kernel, >>> + struct reservation_object *resv, >>> + struct drm_gem_object **obj); >>> int amdgpu_mode_dumb_create(struct drm_file *file_priv, >>> struct drm_device *dev, >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c >>> index 0e907ea..7256f83 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c >>> @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct >>> amdgpu_fbdev *rfbdev, >>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | >>> AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | >>> AMDGPU_GEM_CREATE_VRAM_CLEARED, >>> - true, &gobj); >>> + true, NULL, &gobj); >>> if (ret) { >>> pr_err("failed to allocate framebuffer (%d)\n", aligned_size); >>> return -ENOMEM; >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>> index d028806..b8e8d67 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>> @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object >>> *gobj) >>> } >>> int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned >>> long size, >>> -int alignment, u32 initial_domain, >>> -u64 flags, bool kernel, >>> -struct drm_gem_object **obj) >>> + int alignment, u32 initial_domain, >>> + u64 flags, bool kernel, >>> + struct reservation_object *resv, >>> + struct drm_gem_object **obj) >>> { >>> -struct amdgpu_bo *robj; >>> +struct amdgpu_bo *bo; >>> int r; >>> *obj = NULL; >>> @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device >>> *adev, unsigned long size, >>> retry: >>> r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain, >>> - flags, NULL, NULL, 0, &robj); >>> + flags, NULL, resv, 0, &bo); >>> if (r) { >>> if (r != -ERESTARTSYS) { >>> if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { >>> @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device >>> *adev, unsigned long size, >>> } >>> return r; >>> } >>> -*obj = &robj->gem_base; >>> +*obj = &bo->gem_base; >>> return 0; >>> } >>> @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object >>> *obj, >>> struct amdgpu_vm *vm = &fpriv->vm; >>> struct amdgpu_bo_list_entry vm_pd; >>> -struct list_head list; >>> +struct list_head list, duplicates; >>> struct ttm_validate_buffer tv; >>> struct ww_acquire_ct
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
Am 25.08.2017 um 12:32 schrieb zhoucm1: On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, indeed save a lot of bo list. Don't cheer to loud yet, that is a completely constructed test case. So far I wasn't able to archive any improvements with any real game on this with Mesa. BTW: Marek can you take a look with some CPU bound tests? I can provide a kernel branch if necessary. Regards, Christian. overall looks good, I will take a detailed check for this tomorrow. Regards, David Zhou Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- include/uapi/drm/amdgpu_drm.h | 2 ++ 5 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b1e817c..21cab36 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { */ void amdgpu_gem_force_release(struct amdgpu_device *adev); int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj); + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj); int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index 0e907ea..7256f83 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct amdgpu_fbdev *rfbdev, AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED, - true, &gobj); + true, NULL, &gobj); if (ret) { pr_err("failed to allocate framebuffer (%d)\n", aligned_size); return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d028806..b8e8d67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj) } int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, -int alignment, u32 initial_domain, -u64 flags, bool kernel, -struct drm_gem_object **obj) + int alignment, u32 initial_domain, + u64 flags, bool kernel, + struct reservation_object *resv, + struct drm_gem_object **obj) { -struct amdgpu_bo *robj; +struct amdgpu_bo *bo; int r; *obj = NULL; @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, retry: r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain, - flags, NULL, NULL, 0, &robj); + flags, NULL, resv, 0, &bo); if (r) { if (r != -ERESTARTSYS) { if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, } return r; } -*obj = &robj->gem_base; +*obj = &bo->gem_base; return 0; } @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, struct amdgpu_vm *vm = &fpriv->vm; struct amdgpu_bo_list_entry vm_pd; -struct list_head list; +struct list_head list, duplicates; struct ttm_validate_buffer tv; struct ww_acquire_ctx ticket; struct amdgpu_bo_va *bo_va; int r; INIT_LIST_HEAD(&list); +INIT_LIST_HEAD(&duplicates); tv.bo = &bo->tbo; tv.shared = true; @@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, amdgpu_vm_get_pd_bo(vm, &list, &vm_pd); -r = ttm_eu_reserve_buffers(&ticket, &list, false, NULL); +r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates); if (r) { dev_err(adev->dev, "leaking bo va because " "we fail to reserve bo (%d)\n", r); @@ -185,9 +187,12 @@ int amdgpu_gem_
Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
On 2017年08月25日 17:38, Christian König wrote: From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Wow, cheers, eventually you get per vm bo to same reservation with PD/pts, indeed save a lot of bo list. overall looks good, I will take a detailed check for this tomorrow. Regards, David Zhou Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- include/uapi/drm/amdgpu_drm.h | 2 ++ 5 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b1e817c..21cab36 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { */ void amdgpu_gem_force_release(struct amdgpu_device *adev); int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, - int alignment, u32 initial_domain, - u64 flags, bool kernel, - struct drm_gem_object **obj); +int alignment, u32 initial_domain, +u64 flags, bool kernel, +struct reservation_object *resv, +struct drm_gem_object **obj); int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index 0e907ea..7256f83 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct amdgpu_fbdev *rfbdev, AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED, - true, &gobj); + true, NULL, &gobj); if (ret) { pr_err("failed to allocate framebuffer (%d)\n", aligned_size); return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d028806..b8e8d67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj) } int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, - int alignment, u32 initial_domain, - u64 flags, bool kernel, - struct drm_gem_object **obj) +int alignment, u32 initial_domain, +u64 flags, bool kernel, +struct reservation_object *resv, +struct drm_gem_object **obj) { - struct amdgpu_bo *robj; + struct amdgpu_bo *bo; int r; *obj = NULL; @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, retry: r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain, -flags, NULL, NULL, 0, &robj); +flags, NULL, resv, 0, &bo); if (r) { if (r != -ERESTARTSYS) { if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, } return r; } - *obj = &robj->gem_base; + *obj = &bo->gem_base; return 0; } @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, struct amdgpu_vm *vm = &fpriv->vm; struct amdgpu_bo_list_entry vm_pd; - struct list_head list; + struct list_head list, duplicates; struct ttm_validate_buffer tv; struct ww_acquire_ctx ticket; struct amdgpu_bo_va *bo_va; int r; INIT_LIST_HEAD(&list); + INIT_LIST_HEAD(&duplicates); tv.bo = &bo->tbo; tv.shared = true; @@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, amdgpu_vm_get_pd_bo(vm, &list, &vm_pd); - r = ttm_eu_reserve_buffers(&ticket, &list, false, NULL); + r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates); if (r) { dev_err(adev->dev, "leaking bo va because " "we fail to reserve bo (%d)\n", r); @@
[PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs
From: Christian König Add the IOCTL interface so that applications can allocate per VM BOs. Still WIP since not all corner cases are tested yet, but this reduces average CS overhead for 10K BOs from 21ms down to 48us. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 59 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +- include/uapi/drm/amdgpu_drm.h | 2 ++ 5 files changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b1e817c..21cab36 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -457,9 +457,10 @@ struct amdgpu_sa_bo { */ void amdgpu_gem_force_release(struct amdgpu_device *adev); int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, - int alignment, u32 initial_domain, - u64 flags, bool kernel, - struct drm_gem_object **obj); +int alignment, u32 initial_domain, +u64 flags, bool kernel, +struct reservation_object *resv, +struct drm_gem_object **obj); int amdgpu_mode_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c index 0e907ea..7256f83 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct amdgpu_fbdev *rfbdev, AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED, - true, &gobj); + true, NULL, &gobj); if (ret) { pr_err("failed to allocate framebuffer (%d)\n", aligned_size); return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d028806..b8e8d67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj) } int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, - int alignment, u32 initial_domain, - u64 flags, bool kernel, - struct drm_gem_object **obj) +int alignment, u32 initial_domain, +u64 flags, bool kernel, +struct reservation_object *resv, +struct drm_gem_object **obj) { - struct amdgpu_bo *robj; + struct amdgpu_bo *bo; int r; *obj = NULL; @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, retry: r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain, -flags, NULL, NULL, 0, &robj); +flags, NULL, resv, 0, &bo); if (r) { if (r != -ERESTARTSYS) { if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, } return r; } - *obj = &robj->gem_base; + *obj = &bo->gem_base; return 0; } @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, struct amdgpu_vm *vm = &fpriv->vm; struct amdgpu_bo_list_entry vm_pd; - struct list_head list; + struct list_head list, duplicates; struct ttm_validate_buffer tv; struct ww_acquire_ctx ticket; struct amdgpu_bo_va *bo_va; int r; INIT_LIST_HEAD(&list); + INIT_LIST_HEAD(&duplicates); tv.bo = &bo->tbo; tv.shared = true; @@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj, amdgpu_vm_get_pd_bo(vm, &list, &vm_pd); - r = ttm_eu_reserve_buffers(&ticket, &list, false, NULL); + r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates); if (r) { dev_err(adev->dev, "leaking bo va because " "we fail to reserve bo (%d)\n", r); @@ -185,9 +187,12 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) { struct amdgpu_device *adev = dev->dev_private; + struct amdgpu_fpriv *fpriv = filp->driver_pr