Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-30 Thread Christian König
That was a good hint. glmark2 sees a really nice 5% improvement with 
this change.


Christian.

Am 30.08.2017 um 02:27 schrieb Marek Olšák:

It might be interesting to try glmark2.

Marek

On Tue, Aug 29, 2017 at 3:59 PM, Christian König
 wrote:

Ok, found something that works. Xonotic in lowest resolution, lowest effects
quality (e.g. totally CPU bound):

Without per process BOs:

Xonotic 0.8:
 pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
 Test 1 of 1
 Estimated Trial Run Count:3
 Estimated Time To Completion: 3 Minutes
 Started Run 1 @ 21:13:50
 Started Run 2 @ 21:14:57
 Started Run 3 @ 21:16:03  [Std. Dev: 0.94%]

 Test Results:
 187.436577
 189.514724
 190.9605812

 Average: 189.30 Frames Per Second
 Minimum: 131
 Maximum: 355

With per process BOs:

Xonotic 0.8:
 pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
 Test 1 of 1
 Estimated Trial Run Count:3
 Estimated Time To Completion: 3 Minutes
 Started Run 1 @ 21:20:05
 Started Run 2 @ 21:21:07
 Started Run 3 @ 21:22:10  [Std. Dev: 1.49%]

 Test Results:
 203.0471676
 199.6622532
 197.0954183

 Average: 199.93 Frames Per Second
 Minimum: 132
 Maximum: 349

Well that looks like some improvement.

Regards,
Christian.


Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing):

I will push our vulkan guys to test it, their bo list is very long.

发自坚果 Pro

Christian K鰊ig  于 2017年8月28日 下午7:55写道:

Am 28.08.2017 um 06:21 schrieb zhoucm1:


On 2017年08月27日 18:03, Christian König wrote:

Am 25.08.2017 um 21:19 schrieb Christian König:

Am 25.08.2017 um 18:22 schrieb Marek Olšák:

On Fri, Aug 25, 2017 at 3:00 PM, Christian König
 wrote:

Am 25.08.2017 um 12:32 schrieb zhoucm1:


On 2017年08月25日 17:38, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM
BOs.

Still WIP since not all corner cases are tested yet, but this
reduces
average
CS overhead for 10K BOs from 21ms down to 48us.

Wow, cheers, eventually you get per vm bo to same reservation
with PD/pts,
indeed save a lot of bo list.

Don't cheer to loud yet, that is a completely constructed test case.

So far I wasn't able to archive any improvements with any real
game on this
with Mesa.

With thinking more, too many BOs share one reservation, which could
result in reservation lock often is busy, if eviction or destroy also
happens often in the meaning time, then which could effect VM update
and CS submission as well.

That's exactly the reason why I've added code to the BO destroy path to
avoid at least some of the problems. But yeah, that's only the tip of
the iceberg of problems with that approach.


Anyway, this is very good start and try that we reduce CS overhead,
especially we've seen "reduces average CS overhead for 10K BOs from
21ms down to 48us. ".

Actually, it's not that good. See this is a completely build up test
case on a kernel with lockdep and KASAN enabled.

In reality we usually don't have so many BOs and so far I wasn't able to
find much of an improvement in any real world testing.

Regards,
Christian.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-29 Thread Marek Olšák
It might be interesting to try glmark2.

Marek

On Tue, Aug 29, 2017 at 3:59 PM, Christian König
 wrote:
> Ok, found something that works. Xonotic in lowest resolution, lowest effects
> quality (e.g. totally CPU bound):
>
> Without per process BOs:
>
> Xonotic 0.8:
> pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
> Test 1 of 1
> Estimated Trial Run Count:3
> Estimated Time To Completion: 3 Minutes
> Started Run 1 @ 21:13:50
> Started Run 2 @ 21:14:57
> Started Run 3 @ 21:16:03  [Std. Dev: 0.94%]
>
> Test Results:
> 187.436577
> 189.514724
> 190.9605812
>
> Average: 189.30 Frames Per Second
> Minimum: 131
> Maximum: 355
>
> With per process BOs:
>
> Xonotic 0.8:
> pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
> Test 1 of 1
> Estimated Trial Run Count:3
> Estimated Time To Completion: 3 Minutes
> Started Run 1 @ 21:20:05
> Started Run 2 @ 21:21:07
> Started Run 3 @ 21:22:10  [Std. Dev: 1.49%]
>
> Test Results:
> 203.0471676
> 199.6622532
> 197.0954183
>
> Average: 199.93 Frames Per Second
> Minimum: 132
> Maximum: 349
>
> Well that looks like some improvement.
>
> Regards,
> Christian.
>
>
> Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing):
>
> I will push our vulkan guys to test it, their bo list is very long.
>
> 发自坚果 Pro
>
> Christian K鰊ig  于 2017年8月28日 下午7:55写道:
>
> Am 28.08.2017 um 06:21 schrieb zhoucm1:
>>
>>
>> On 2017年08月27日 18:03, Christian König wrote:
>>> Am 25.08.2017 um 21:19 schrieb Christian König:
 Am 25.08.2017 um 18:22 schrieb Marek Olšák:
> On Fri, Aug 25, 2017 at 3:00 PM, Christian König
>  wrote:
>> Am 25.08.2017 um 12:32 schrieb zhoucm1:
>>>
>>>
>>> On 2017年08月25日 17:38, Christian König wrote:
 From: Christian König 

 Add the IOCTL interface so that applications can allocate per VM
 BOs.

 Still WIP since not all corner cases are tested yet, but this
 reduces
 average
 CS overhead for 10K BOs from 21ms down to 48us.
>>> Wow, cheers, eventually you get per vm bo to same reservation
>>> with PD/pts,
>>> indeed save a lot of bo list.
>>
>> Don't cheer to loud yet, that is a completely constructed test case.
>>
>> So far I wasn't able to archive any improvements with any real
>> game on this
>> with Mesa.
>> With thinking more, too many BOs share one reservation, which could
>> result in reservation lock often is busy, if eviction or destroy also
>> happens often in the meaning time, then which could effect VM update
>> and CS submission as well.
>
> That's exactly the reason why I've added code to the BO destroy path to
> avoid at least some of the problems. But yeah, that's only the tip of
> the iceberg of problems with that approach.
>
>> Anyway, this is very good start and try that we reduce CS overhead,
>> especially we've seen "reduces average CS overhead for 10K BOs from
>> 21ms down to 48us. ".
>
> Actually, it's not that good. See this is a completely build up test
> case on a kernel with lockdep and KASAN enabled.
>
> In reality we usually don't have so many BOs and so far I wasn't able to
> find much of an improvement in any real world testing.
>
> Regards,
> Christian.
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-29 Thread Christian König
Ok, found something that works. Xonotic in lowest resolution, lowest 
effects quality (e.g. totally CPU bound):


Without per process BOs:

Xonotic 0.8:
pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
Test 1 of 1
Estimated Trial Run Count:3
Estimated Time To Completion: 3 Minutes
Started Run 1 @ 21:13:50
Started Run 2 @ 21:14:57
Started Run 3 @ 21:16:03  [Std. Dev: 0.94%]

Test Results:
187.436577
189.514724
190.9605812

Average: 189.30 Frames Per Second
Minimum: 131
Maximum: 355

With per process BOs:

Xonotic 0.8:
pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
Test 1 of 1
Estimated Trial Run Count:3
Estimated Time To Completion: 3 Minutes
Started Run 1 @ 21:20:05
Started Run 2 @ 21:21:07
Started Run 3 @ 21:22:10  [Std. Dev: 1.49%]

Test Results:
203.0471676
199.6622532
197.0954183

Average: 199.93 Frames Per Second
Minimum: 132
Maximum: 349

Well that looks like some improvement.

Regards,
Christian.

Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing):

I will push our vulkan guys to test it, their bo list is very long.

发自坚果 Pro

Christian K鰊ig  于 2017年8月28日 下午7:55写道:

Am 28.08.2017 um 06:21 schrieb zhoucm1:
>
>
> On 2017年08月27日 18:03, Christian König wrote:
>> Am 25.08.2017 um 21:19 schrieb Christian König:
>>> Am 25.08.2017 um 18:22 schrieb Marek Olšák:
 On Fri, Aug 25, 2017 at 3:00 PM, Christian König
  wrote:
> Am 25.08.2017 um 12:32 schrieb zhoucm1:
>>
>>
>> On 2017年08月25日 17:38, Christian König wrote:
>>> From: Christian König 
>>>
>>> Add the IOCTL interface so that applications can allocate per VM
>>> BOs.
>>>
>>> Still WIP since not all corner cases are tested yet, but this
>>> reduces
>>> average
>>> CS overhead for 10K BOs from 21ms down to 48us.
>> Wow, cheers, eventually you get per vm bo to same reservation
>> with PD/pts,
>> indeed save a lot of bo list.
>
> Don't cheer to loud yet, that is a completely constructed test case.
>
> So far I wasn't able to archive any improvements with any real
> game on this
> with Mesa.
> With thinking more, too many BOs share one reservation, which could
> result in reservation lock often is busy, if eviction or destroy also
> happens often in the meaning time, then which could effect VM update
> and CS submission as well.

That's exactly the reason why I've added code to the BO destroy path to
avoid at least some of the problems. But yeah, that's only the tip of
the iceberg of problems with that approach.

> Anyway, this is very good start and try that we reduce CS overhead,
> especially we've seen "reduces average CS overhead for 10K BOs from
> 21ms down to 48us. ".

Actually, it's not that good. See this is a completely build up test
case on a kernel with lockdep and KASAN enabled.

In reality we usually don't have so many BOs and so far I wasn't able to
find much of an improvement in any real world testing.

Regards,
Christian.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-28 Thread Zhou, David(ChunMing)
I will push our vulkan guys to test it, their bo list is very long.


发自坚果 Pro

Christian K鰊ig  于 2017年8月28日 下午7:55写道:

Am 28.08.2017 um 06:21 schrieb zhoucm1:
>
>
> On 2017年08月27日 18:03, Christian König wrote:
>> Am 25.08.2017 um 21:19 schrieb Christian König:
>>> Am 25.08.2017 um 18:22 schrieb Marek Olšák:
 On Fri, Aug 25, 2017 at 3:00 PM, Christian König
  wrote:
> Am 25.08.2017 um 12:32 schrieb zhoucm1:
>>
>>
>> On 2017年08月25日 17:38, Christian König wrote:
>>> From: Christian König 
>>>
>>> Add the IOCTL interface so that applications can allocate per VM
>>> BOs.
>>>
>>> Still WIP since not all corner cases are tested yet, but this
>>> reduces
>>> average
>>> CS overhead for 10K BOs from 21ms down to 48us.
>> Wow, cheers, eventually you get per vm bo to same reservation
>> with PD/pts,
>> indeed save a lot of bo list.
>
> Don't cheer to loud yet, that is a completely constructed test case.
>
> So far I wasn't able to archive any improvements with any real
> game on this
> with Mesa.
> With thinking more, too many BOs share one reservation, which could
> result in reservation lock often is busy, if eviction or destroy also
> happens often in the meaning time, then which could effect VM update
> and CS submission as well.

That's exactly the reason why I've added code to the BO destroy path to
avoid at least some of the problems. But yeah, that's only the tip of
the iceberg of problems with that approach.

> Anyway, this is very good start and try that we reduce CS overhead,
> especially we've seen "reduces average CS overhead for 10K BOs from
> 21ms down to 48us. ".

Actually, it's not that good. See this is a completely build up test
case on a kernel with lockdep and KASAN enabled.

In reality we usually don't have so many BOs and so far I wasn't able to
find much of an improvement in any real world testing.

Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-28 Thread Christian König

Am 28.08.2017 um 06:21 schrieb zhoucm1:



On 2017年08月27日 18:03, Christian König wrote:

Am 25.08.2017 um 21:19 schrieb Christian König:

Am 25.08.2017 um 18:22 schrieb Marek Olšák:

On Fri, Aug 25, 2017 at 3:00 PM, Christian König
 wrote:

Am 25.08.2017 um 12:32 schrieb zhoucm1:



On 2017年08月25日 17:38, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM 
BOs.


Still WIP since not all corner cases are tested yet, but this 
reduces

average
CS overhead for 10K BOs from 21ms down to 48us.
Wow, cheers, eventually you get per vm bo to same reservation 
with PD/pts,

indeed save a lot of bo list.


Don't cheer to loud yet, that is a completely constructed test case.

So far I wasn't able to archive any improvements with any real 
game on this

with Mesa.
With thinking more, too many BOs share one reservation, which could 
result in reservation lock often is busy, if eviction or destroy also 
happens often in the meaning time, then which could effect VM update 
and CS submission as well.


That's exactly the reason why I've added code to the BO destroy path to 
avoid at least some of the problems. But yeah, that's only the tip of 
the iceberg of problems with that approach.


Anyway, this is very good start and try that we reduce CS overhead, 
especially we've seen "reduces average CS overhead for 10K BOs from 
21ms down to 48us. ".


Actually, it's not that good. See this is a completely build up test 
case on a kernel with lockdep and KASAN enabled.


In reality we usually don't have so many BOs and so far I wasn't able to 
find much of an improvement in any real world testing.


Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-27 Thread zhoucm1



On 2017年08月27日 18:03, Christian König wrote:

Am 25.08.2017 um 21:19 schrieb Christian König:

Am 25.08.2017 um 18:22 schrieb Marek Olšák:

On Fri, Aug 25, 2017 at 3:00 PM, Christian König
 wrote:

Am 25.08.2017 um 12:32 schrieb zhoucm1:



On 2017年08月25日 17:38, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM 
BOs.


Still WIP since not all corner cases are tested yet, but this 
reduces

average
CS overhead for 10K BOs from 21ms down to 48us.
Wow, cheers, eventually you get per vm bo to same reservation with 
PD/pts,

indeed save a lot of bo list.


Don't cheer to loud yet, that is a completely constructed test case.

So far I wasn't able to archive any improvements with any real game 
on this

with Mesa.
With thinking more, too many BOs share one reservation, which could 
result in reservation lock often is busy, if eviction or destroy also 
happens often in the meaning time, then which could effect VM update and 
CS submission as well.


Anyway, this is very good start and try that we reduce CS overhead, 
especially we've seen "reduces average CS overhead for 10K BOs from 21ms 
down to 48us. ".


Regards,
David Zhou


BTW: Marek can you take a look with some CPU bound tests? I can 
provide a

kernel branch if necessary.

Do you have a branch that works on Raven? This patch series doesn't,
and I didn't investigate why.


I will come up with one on Monday.


Branch vm_improvements on 
git://people.freedesktop.org/~deathsimple/linux together with the 
attached patch should work.


Only testing on Tonga, but that's based on amd-staging-4.12 and so 
should work on Raven as well. If not I still have a bug somewhere 
which needs to be fixed.


Thanks,
Christian.


Have a nice weekend guys,
Christian.



Marek


Regards,
Christian.



overall looks good, I will take a detailed check for this tomorrow.

Regards,
David Zhou


Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59
++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
   include/uapi/drm/amdgpu_drm.h |  2 ++
   5 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b1e817c..21cab36 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
*/
   void amdgpu_gem_force_release(struct amdgpu_device *adev);
   int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long

size,
-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj);
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj);
 int amdgpu_mode_dumb_create(struct drm_file *file_priv,
   struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 0e907ea..7256f83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct
amdgpu_fbdev *rfbdev,
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
AMDGPU_GEM_CREATE_VRAM_CLEARED,
-   true, &gobj);
+   true, NULL, &gobj);
   if (ret) {
   pr_err("failed to allocate framebuffer (%d)\n", 
aligned_size);

   return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d028806..b8e8d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct 
drm_gem_object

*gobj)
   }
 int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned

long size,
-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj)
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj)
   {
-struct amdgpu_bo *robj;
+struct amdgpu_bo *bo;
   int r;
 *obj = NULL;
@@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
*adev, unsigned long size,
 retry:
   r = amdgpu_bo_create(adev, size, alignment, kernel, 
initial_domain,

- flags, NULL, NULL, 0, &robj);
+ flags, NULL, resv, 0, &bo);
   if (r) {
   if (r != -ERESTARTSYS) {
   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
@@ -71,7 +72,7 @@ int amdgpu_gem_object_

Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-27 Thread Christian König

Am 25.08.2017 um 21:19 schrieb Christian König:

Am 25.08.2017 um 18:22 schrieb Marek Olšák:

On Fri, Aug 25, 2017 at 3:00 PM, Christian König
 wrote:

Am 25.08.2017 um 12:32 schrieb zhoucm1:



On 2017年08月25日 17:38, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM BOs.

Still WIP since not all corner cases are tested yet, but this reduces
average
CS overhead for 10K BOs from 21ms down to 48us.
Wow, cheers, eventually you get per vm bo to same reservation with 
PD/pts,

indeed save a lot of bo list.


Don't cheer to loud yet, that is a completely constructed test case.

So far I wasn't able to archive any improvements with any real game 
on this

with Mesa.

BTW: Marek can you take a look with some CPU bound tests? I can 
provide a

kernel branch if necessary.

Do you have a branch that works on Raven? This patch series doesn't,
and I didn't investigate why.


I will come up with one on Monday.


Branch vm_improvements on 
git://people.freedesktop.org/~deathsimple/linux together with the 
attached patch should work.


Only testing on Tonga, but that's based on amd-staging-4.12 and so 
should work on Raven as well. If not I still have a bug somewhere which 
needs to be fixed.


Thanks,
Christian.


Have a nice weekend guys,
Christian.



Marek


Regards,
Christian.



overall looks good, I will take a detailed check for this tomorrow.

Regards,
David Zhou


Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59
++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
   include/uapi/drm/amdgpu_drm.h |  2 ++
   5 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b1e817c..21cab36 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
*/
   void amdgpu_gem_force_release(struct amdgpu_device *adev);
   int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long

size,
-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj);
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj);
 int amdgpu_mode_dumb_create(struct drm_file *file_priv,
   struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 0e907ea..7256f83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct
amdgpu_fbdev *rfbdev,
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
  AMDGPU_GEM_CREATE_VRAM_CLEARED,
-   true, &gobj);
+   true, NULL, &gobj);
   if (ret) {
   pr_err("failed to allocate framebuffer (%d)\n", 
aligned_size);

   return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d028806..b8e8d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object
*gobj)
   }
 int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned

long size,
-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj)
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj)
   {
-struct amdgpu_bo *robj;
+struct amdgpu_bo *bo;
   int r;
 *obj = NULL;
@@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
*adev, unsigned long size,
 retry:
   r = amdgpu_bo_create(adev, size, alignment, kernel, 
initial_domain,

- flags, NULL, NULL, 0, &robj);
+ flags, NULL, resv, 0, &bo);
   if (r) {
   if (r != -ERESTARTSYS) {
   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
@@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
*adev, unsigned long size,
   }
   return r;
   }
-*obj = &robj->gem_base;
+*obj = &bo->gem_base;
 return 0;
   }
@@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct 
drm_gem_object

*obj,
   struct amdgpu_vm *vm = &fpriv->vm;
 struct amdgpu_bo_list_entry vm_pd;
-struct list_head list;
+struct list_head list, duplicates;
   struct ttm_validate_buffer tv;

Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-26 Thread Christian König

Am 25.08.2017 um 23:31 schrieb Felix Kuehling:

That's clever. I was scratching my head where the BOs were getting
validated, just by sharing the VM reservation object. Until I carefully
read your previous commit.


Yeah, didn't had time to properly comment on your last mail.


In general, I find the VM code extremely frustrating and confusing to
review. Too many lists of different things, and it's really hard to keep
track of which list track what type of object, in which situation.

For example, take struct amdgpu_vm:

/* BOs who needs a validation */
struct list_headevicted;

/* BOs moved, but not yet updated in the PT */
struct list_headmoved;

/* BO mappings freed, but not yet updated in the PT */
struct list_headfreed;

Three lists of BOs (according to the comments). But evicted and moved
are lists of amdgpu_vm_bo_base, freed is a list of amdgpu_bo_va_mapping.
moved and freed are used for tracking BOs mapped in the VM. I think
moved may also track page table BOs, but I'm not sure. evicted is used
only for tracking page table BOs.

In patch #7 you add relocated to the mix. Now it gets really funny.
What's the difference between relocated and evicted and moved?


Essentially nothing. I actually tried to merge them, but then realized 
that this would probably remove the ability to only update the 
directories and so most likely your KFD usage of that.



It seems
PT BOs can be on any of these lists. I think evicted means the BO needs
to be validated. moved or relocated means it's been validated but its
mappings must be updated. For PT BOs and mapped BOs that means different
things, so it makes sense to have different lists for them. But I think
PT BOs can also end up on the moved list when amdgpu_vm_bo_invalidate is
called for a page table BO (through amdgpu_bo_move_notify). So I'm still
confused.


amdgpu_vm_bo_invalidate() has logic to always put PDs and PTs on the 
relocated list and everything else on the moved list. So PDs/PTs should 
never end up on the moved list.



I think this could be clarified with more descriptive names for the
lists. If PT BOs and mapped BOs must be tracked separately that should
be clear from the names.


That is a good idea, but in the long term I want to merge reloacted and 
moved list and then decide while walking the list what to do, but that 
is the next step I think.


Regards,
Christian.





Regards,
   Felix


On 2017-08-25 05:38 AM, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM BOs.

Still WIP since not all corner cases are tested yet, but this reduces average
CS overhead for 10K BOs from 21ms down to 48us.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
  include/uapi/drm/amdgpu_drm.h |  2 ++
  5 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b1e817c..21cab36 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
   */
  void amdgpu_gem_force_release(struct amdgpu_device *adev);
  int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
-   int alignment, u32 initial_domain,
-   u64 flags, bool kernel,
-   struct drm_gem_object **obj);
+int alignment, u32 initial_domain,
+u64 flags, bool kernel,
+struct reservation_object *resv,
+struct drm_gem_object **obj);
  
  int amdgpu_mode_dumb_create(struct drm_file *file_priv,

struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 0e907ea..7256f83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct 
amdgpu_fbdev *rfbdev,
   AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
   AMDGPU_GEM_CREATE_VRAM_CLEARED,
-  true, &gobj);
+  true, NULL, &gobj);
if (ret) {
pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d028806..b8e8d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/d

Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-25 Thread Felix Kuehling
That's clever. I was scratching my head where the BOs were getting
validated, just by sharing the VM reservation object. Until I carefully
read your previous commit.

In general, I find the VM code extremely frustrating and confusing to
review. Too many lists of different things, and it's really hard to keep
track of which list track what type of object, in which situation.

For example, take struct amdgpu_vm:

/* BOs who needs a validation */
struct list_headevicted;

/* BOs moved, but not yet updated in the PT */
struct list_headmoved;

/* BO mappings freed, but not yet updated in the PT */
struct list_headfreed;

Three lists of BOs (according to the comments). But evicted and moved
are lists of amdgpu_vm_bo_base, freed is a list of amdgpu_bo_va_mapping.
moved and freed are used for tracking BOs mapped in the VM. I think
moved may also track page table BOs, but I'm not sure. evicted is used
only for tracking page table BOs.

In patch #7 you add relocated to the mix. Now it gets really funny.
What's the difference between relocated and evicted and moved? It seems
PT BOs can be on any of these lists. I think evicted means the BO needs
to be validated. moved or relocated means it's been validated but its
mappings must be updated. For PT BOs and mapped BOs that means different
things, so it makes sense to have different lists for them. But I think
PT BOs can also end up on the moved list when amdgpu_vm_bo_invalidate is
called for a page table BO (through amdgpu_bo_move_notify). So I'm still
confused.

I think this could be clarified with more descriptive names for the
lists. If PT BOs and mapped BOs must be tracked separately that should
be clear from the names.



Regards,
  Felix


On 2017-08-25 05:38 AM, Christian König wrote:
> From: Christian König 
>
> Add the IOCTL interface so that applications can allocate per VM BOs.
>
> Still WIP since not all corner cases are tested yet, but this reduces average
> CS overhead for 10K BOs from 21ms down to 48us.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59 
> ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
>  include/uapi/drm/amdgpu_drm.h |  2 ++
>  5 files changed, 51 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index b1e817c..21cab36 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
>   */
>  void amdgpu_gem_force_release(struct amdgpu_device *adev);
>  int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
> - int alignment, u32 initial_domain,
> - u64 flags, bool kernel,
> - struct drm_gem_object **obj);
> +  int alignment, u32 initial_domain,
> +  u64 flags, bool kernel,
> +  struct reservation_object *resv,
> +  struct drm_gem_object **obj);
>  
>  int amdgpu_mode_dumb_create(struct drm_file *file_priv,
>   struct drm_device *dev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> index 0e907ea..7256f83 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
> @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct 
> amdgpu_fbdev *rfbdev,
>  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
>  AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
>  AMDGPU_GEM_CREATE_VRAM_CLEARED,
> -true, &gobj);
> +true, NULL, &gobj);
>   if (ret) {
>   pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
>   return -ENOMEM;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index d028806..b8e8d67 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj)
>  }
>  
>  int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
> - int alignment, u32 initial_domain,
> - u64 flags, bool kernel,
> - struct drm_gem_object **obj)
> +  int alignment, u32 initial_domain,
> +  u64 flags, bool kernel,
> +  struct reservation_object *resv,
> +  struct drm_gem_object *

Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-25 Thread Christian König

Am 25.08.2017 um 18:22 schrieb Marek Olšák:

On Fri, Aug 25, 2017 at 3:00 PM, Christian König
 wrote:

Am 25.08.2017 um 12:32 schrieb zhoucm1:



On 2017年08月25日 17:38, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM BOs.

Still WIP since not all corner cases are tested yet, but this reduces
average
CS overhead for 10K BOs from 21ms down to 48us.

Wow, cheers, eventually you get per vm bo to same reservation with PD/pts,
indeed save a lot of bo list.


Don't cheer to loud yet, that is a completely constructed test case.

So far I wasn't able to archive any improvements with any real game on this
with Mesa.

BTW: Marek can you take a look with some CPU bound tests? I can provide a
kernel branch if necessary.

Do you have a branch that works on Raven? This patch series doesn't,
and I didn't investigate why.


I will come up with one on Monday.

Have a nice weekend guys,
Christian.



Marek


Regards,
Christian.



overall looks good, I will take a detailed check for this tomorrow.

Regards,
David Zhou


Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59
++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
   include/uapi/drm/amdgpu_drm.h |  2 ++
   5 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b1e817c..21cab36 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
*/
   void amdgpu_gem_force_release(struct amdgpu_device *adev);
   int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long
size,
-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj);
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj);
 int amdgpu_mode_dumb_create(struct drm_file *file_priv,
   struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 0e907ea..7256f83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct
amdgpu_fbdev *rfbdev,
  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
  AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
  AMDGPU_GEM_CREATE_VRAM_CLEARED,
-   true, &gobj);
+   true, NULL, &gobj);
   if (ret) {
   pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
   return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d028806..b8e8d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object
*gobj)
   }
 int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned
long size,
-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj)
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj)
   {
-struct amdgpu_bo *robj;
+struct amdgpu_bo *bo;
   int r;
 *obj = NULL;
@@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
*adev, unsigned long size,
 retry:
   r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain,
- flags, NULL, NULL, 0, &robj);
+ flags, NULL, resv, 0, &bo);
   if (r) {
   if (r != -ERESTARTSYS) {
   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
@@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
*adev, unsigned long size,
   }
   return r;
   }
-*obj = &robj->gem_base;
+*obj = &bo->gem_base;
 return 0;
   }
@@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object
*obj,
   struct amdgpu_vm *vm = &fpriv->vm;
 struct amdgpu_bo_list_entry vm_pd;
-struct list_head list;
+struct list_head list, duplicates;
   struct ttm_validate_buffer tv;
   struct ww_acquire_ctx ticket;
   struct amdgpu_bo_va *bo_va;
   int r;
 INIT_LIST_HEAD(&list);
+INIT_LIST_HEAD(&duplicates);
 tv.bo = &bo->tbo;
   tv.shared = true;
@@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct drm_gem_object
*obj,
 amdgpu_vm_get_pd_bo(vm, &list

Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-25 Thread Marek Olšák
On Fri, Aug 25, 2017 at 3:00 PM, Christian König
 wrote:
> Am 25.08.2017 um 12:32 schrieb zhoucm1:
>>
>>
>>
>> On 2017年08月25日 17:38, Christian König wrote:
>>>
>>> From: Christian König 
>>>
>>> Add the IOCTL interface so that applications can allocate per VM BOs.
>>>
>>> Still WIP since not all corner cases are tested yet, but this reduces
>>> average
>>> CS overhead for 10K BOs from 21ms down to 48us.
>>
>> Wow, cheers, eventually you get per vm bo to same reservation with PD/pts,
>> indeed save a lot of bo list.
>
>
> Don't cheer to loud yet, that is a completely constructed test case.
>
> So far I wasn't able to archive any improvements with any real game on this
> with Mesa.
>
> BTW: Marek can you take a look with some CPU bound tests? I can provide a
> kernel branch if necessary.

Do you have a branch that works on Raven? This patch series doesn't,
and I didn't investigate why.

Marek

>
> Regards,
> Christian.
>
>
>> overall looks good, I will take a detailed check for this tomorrow.
>>
>> Regards,
>> David Zhou
>>>
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59
>>> ++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
>>>   include/uapi/drm/amdgpu_drm.h |  2 ++
>>>   5 files changed, 51 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index b1e817c..21cab36 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
>>>*/
>>>   void amdgpu_gem_force_release(struct amdgpu_device *adev);
>>>   int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long
>>> size,
>>> -int alignment, u32 initial_domain,
>>> -u64 flags, bool kernel,
>>> -struct drm_gem_object **obj);
>>> + int alignment, u32 initial_domain,
>>> + u64 flags, bool kernel,
>>> + struct reservation_object *resv,
>>> + struct drm_gem_object **obj);
>>> int amdgpu_mode_dumb_create(struct drm_file *file_priv,
>>>   struct drm_device *dev,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> index 0e907ea..7256f83 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct
>>> amdgpu_fbdev *rfbdev,
>>>  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
>>>  AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
>>>  AMDGPU_GEM_CREATE_VRAM_CLEARED,
>>> -   true, &gobj);
>>> +   true, NULL, &gobj);
>>>   if (ret) {
>>>   pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
>>>   return -ENOMEM;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index d028806..b8e8d67 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object
>>> *gobj)
>>>   }
>>> int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned
>>> long size,
>>> -int alignment, u32 initial_domain,
>>> -u64 flags, bool kernel,
>>> -struct drm_gem_object **obj)
>>> + int alignment, u32 initial_domain,
>>> + u64 flags, bool kernel,
>>> + struct reservation_object *resv,
>>> + struct drm_gem_object **obj)
>>>   {
>>> -struct amdgpu_bo *robj;
>>> +struct amdgpu_bo *bo;
>>>   int r;
>>> *obj = NULL;
>>> @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
>>> *adev, unsigned long size,
>>> retry:
>>>   r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain,
>>> - flags, NULL, NULL, 0, &robj);
>>> + flags, NULL, resv, 0, &bo);
>>>   if (r) {
>>>   if (r != -ERESTARTSYS) {
>>>   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>>> @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
>>> *adev, unsigned long size,
>>>   }
>>>   return r;
>>>   }
>>> -*obj = &robj->gem_base;
>>> +*obj = &bo->gem_base;
>>> return 0;
>>>   }
>>> @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object
>>> *obj,
>>>   struct amdgpu_vm *vm = &fpriv->vm;
>>> struct amdgpu_bo_list_entry vm_pd;
>>> -struct list_head list;
>>> +struct list_head list, duplicates;
>>>   struct ttm_validate_buffer tv;
>>>   struct ww_acquire_ct

Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-25 Thread Christian König

Am 25.08.2017 um 12:32 schrieb zhoucm1:



On 2017年08月25日 17:38, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM BOs.

Still WIP since not all corner cases are tested yet, but this reduces 
average

CS overhead for 10K BOs from 21ms down to 48us.
Wow, cheers, eventually you get per vm bo to same reservation with 
PD/pts, indeed save a lot of bo list.


Don't cheer to loud yet, that is a completely constructed test case.

So far I wasn't able to archive any improvements with any real game on 
this with Mesa.


BTW: Marek can you take a look with some CPU bound tests? I can provide 
a kernel branch if necessary.


Regards,
Christian.


overall looks good, I will take a detailed check for this tomorrow.

Regards,
David Zhou


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59 
++-

  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
  include/uapi/drm/amdgpu_drm.h |  2 ++
  5 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index b1e817c..21cab36 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
   */
  void amdgpu_gem_force_release(struct amdgpu_device *adev);
  int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned 
long size,

-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj);
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj);
int amdgpu_mode_dumb_create(struct drm_file *file_priv,
  struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c

index 0e907ea..7256f83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct 
amdgpu_fbdev *rfbdev,

 AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
 AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
 AMDGPU_GEM_CREATE_VRAM_CLEARED,
-   true, &gobj);
+   true, NULL, &gobj);
  if (ret) {
  pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
  return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index d028806..b8e8d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object 
*gobj)

  }
int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned 
long size,

-int alignment, u32 initial_domain,
-u64 flags, bool kernel,
-struct drm_gem_object **obj)
+ int alignment, u32 initial_domain,
+ u64 flags, bool kernel,
+ struct reservation_object *resv,
+ struct drm_gem_object **obj)
  {
-struct amdgpu_bo *robj;
+struct amdgpu_bo *bo;
  int r;
*obj = NULL;
@@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device 
*adev, unsigned long size,

retry:
  r = amdgpu_bo_create(adev, size, alignment, kernel, 
initial_domain,

- flags, NULL, NULL, 0, &robj);
+ flags, NULL, resv, 0, &bo);
  if (r) {
  if (r != -ERESTARTSYS) {
  if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
@@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device 
*adev, unsigned long size,

  }
  return r;
  }
-*obj = &robj->gem_base;
+*obj = &bo->gem_base;
return 0;
  }
@@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct 
drm_gem_object *obj,

  struct amdgpu_vm *vm = &fpriv->vm;
struct amdgpu_bo_list_entry vm_pd;
-struct list_head list;
+struct list_head list, duplicates;
  struct ttm_validate_buffer tv;
  struct ww_acquire_ctx ticket;
  struct amdgpu_bo_va *bo_va;
  int r;
INIT_LIST_HEAD(&list);
+INIT_LIST_HEAD(&duplicates);
tv.bo = &bo->tbo;
  tv.shared = true;
@@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct 
drm_gem_object *obj,

amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
  -r = ttm_eu_reserve_buffers(&ticket, &list, false, NULL);
+r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates);
  if (r) {
  dev_err(adev->dev, "leaking bo va because "
  "we fail to reserve bo (%d)\n", r);
@@ -185,9 +187,12 @@ int amdgpu_gem_

Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-25 Thread zhoucm1



On 2017年08月25日 17:38, Christian König wrote:

From: Christian König 

Add the IOCTL interface so that applications can allocate per VM BOs.

Still WIP since not all corner cases are tested yet, but this reduces average
CS overhead for 10K BOs from 21ms down to 48us.
Wow, cheers, eventually you get per vm bo to same reservation with 
PD/pts, indeed save a lot of bo list.


overall looks good, I will take a detailed check for this tomorrow.

Regards,
David Zhou


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
  include/uapi/drm/amdgpu_drm.h |  2 ++
  5 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b1e817c..21cab36 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
   */
  void amdgpu_gem_force_release(struct amdgpu_device *adev);
  int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
-   int alignment, u32 initial_domain,
-   u64 flags, bool kernel,
-   struct drm_gem_object **obj);
+int alignment, u32 initial_domain,
+u64 flags, bool kernel,
+struct reservation_object *resv,
+struct drm_gem_object **obj);
  
  int amdgpu_mode_dumb_create(struct drm_file *file_priv,

struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 0e907ea..7256f83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct 
amdgpu_fbdev *rfbdev,
   AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
   AMDGPU_GEM_CREATE_VRAM_CLEARED,
-  true, &gobj);
+  true, NULL, &gobj);
if (ret) {
pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d028806..b8e8d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj)
  }
  
  int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,

-   int alignment, u32 initial_domain,
-   u64 flags, bool kernel,
-   struct drm_gem_object **obj)
+int alignment, u32 initial_domain,
+u64 flags, bool kernel,
+struct reservation_object *resv,
+struct drm_gem_object **obj)
  {
-   struct amdgpu_bo *robj;
+   struct amdgpu_bo *bo;
int r;
  
  	*obj = NULL;

@@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
  
  retry:

r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain,
-flags, NULL, NULL, 0, &robj);
+flags, NULL, resv, 0, &bo);
if (r) {
if (r != -ERESTARTSYS) {
if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
@@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
}
return r;
}
-   *obj = &robj->gem_base;
+   *obj = &bo->gem_base;
  
  	return 0;

  }
@@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
struct amdgpu_vm *vm = &fpriv->vm;
  
  	struct amdgpu_bo_list_entry vm_pd;

-   struct list_head list;
+   struct list_head list, duplicates;
struct ttm_validate_buffer tv;
struct ww_acquire_ctx ticket;
struct amdgpu_bo_va *bo_va;
int r;
  
  	INIT_LIST_HEAD(&list);

+   INIT_LIST_HEAD(&duplicates);
  
  	tv.bo = &bo->tbo;

tv.shared = true;
@@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
  
  	amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
  
-	r = ttm_eu_reserve_buffers(&ticket, &list, false, NULL);

+   r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates);
if (r) {
dev_err(adev->dev, "leaking bo va because "
"we fail to reserve bo (%d)\n", r);
@@

[PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-25 Thread Christian König
From: Christian König 

Add the IOCTL interface so that applications can allocate per VM BOs.

Still WIP since not all corner cases are tested yet, but this reduces average
CS overhead for 10K BOs from 21ms down to 48us.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
 include/uapi/drm/amdgpu_drm.h |  2 ++
 5 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b1e817c..21cab36 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
  */
 void amdgpu_gem_force_release(struct amdgpu_device *adev);
 int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
-   int alignment, u32 initial_domain,
-   u64 flags, bool kernel,
-   struct drm_gem_object **obj);
+int alignment, u32 initial_domain,
+u64 flags, bool kernel,
+struct reservation_object *resv,
+struct drm_gem_object **obj);
 
 int amdgpu_mode_dumb_create(struct drm_file *file_priv,
struct drm_device *dev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 0e907ea..7256f83 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct 
amdgpu_fbdev *rfbdev,
   AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
   AMDGPU_GEM_CREATE_VRAM_CLEARED,
-  true, &gobj);
+  true, NULL, &gobj);
if (ret) {
pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d028806..b8e8d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj)
 }
 
 int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
-   int alignment, u32 initial_domain,
-   u64 flags, bool kernel,
-   struct drm_gem_object **obj)
+int alignment, u32 initial_domain,
+u64 flags, bool kernel,
+struct reservation_object *resv,
+struct drm_gem_object **obj)
 {
-   struct amdgpu_bo *robj;
+   struct amdgpu_bo *bo;
int r;
 
*obj = NULL;
@@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
 
 retry:
r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain,
-flags, NULL, NULL, 0, &robj);
+flags, NULL, resv, 0, &bo);
if (r) {
if (r != -ERESTARTSYS) {
if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
@@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
}
return r;
}
-   *obj = &robj->gem_base;
+   *obj = &bo->gem_base;
 
return 0;
 }
@@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
struct amdgpu_vm *vm = &fpriv->vm;
 
struct amdgpu_bo_list_entry vm_pd;
-   struct list_head list;
+   struct list_head list, duplicates;
struct ttm_validate_buffer tv;
struct ww_acquire_ctx ticket;
struct amdgpu_bo_va *bo_va;
int r;
 
INIT_LIST_HEAD(&list);
+   INIT_LIST_HEAD(&duplicates);
 
tv.bo = &bo->tbo;
tv.shared = true;
@@ -150,7 +152,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
 
amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);
 
-   r = ttm_eu_reserve_buffers(&ticket, &list, false, NULL);
+   r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates);
if (r) {
dev_err(adev->dev, "leaking bo va because "
"we fail to reserve bo (%d)\n", r);
@@ -185,9 +187,12 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *filp)
 {
struct amdgpu_device *adev = dev->dev_private;
+   struct amdgpu_fpriv *fpriv = filp->driver_pr