That was a good hint. glmark2 sees a really nice 5% improvement with this change.

Christian.

Am 30.08.2017 um 02:27 schrieb Marek Olšák:
It might be interesting to try glmark2.

Marek

On Tue, Aug 29, 2017 at 3:59 PM, Christian König
<deathsim...@vodafone.de> wrote:
Ok, found something that works. Xonotic in lowest resolution, lowest effects
quality (e.g. totally CPU bound):

Without per process BOs:

Xonotic 0.8:
     pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
     Test 1 of 1
     Estimated Trial Run Count:    3
     Estimated Time To Completion: 3 Minutes
         Started Run 1 @ 21:13:50
         Started Run 2 @ 21:14:57
         Started Run 3 @ 21:16:03  [Std. Dev: 0.94%]

     Test Results:
         187.436577
         189.514724
         190.9605812

     Average: 189.30 Frames Per Second
     Minimum: 131
     Maximum: 355

With per process BOs:

Xonotic 0.8:
     pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
     Test 1 of 1
     Estimated Trial Run Count:    3
     Estimated Time To Completion: 3 Minutes
         Started Run 1 @ 21:20:05
         Started Run 2 @ 21:21:07
         Started Run 3 @ 21:22:10  [Std. Dev: 1.49%]

     Test Results:
         203.0471676
         199.6622532
         197.0954183

     Average: 199.93 Frames Per Second
     Minimum: 132
     Maximum: 349

Well that looks like some improvement.

Regards,
Christian.


Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing):

I will push our vulkan guys to test it, their bo list is very long.

发自坚果 Pro

Christian K鰊ig <deathsim...@vodafone.de> 于 2017年8月28日 下午7:55写道:

Am 28.08.2017 um 06:21 schrieb zhoucm1:

On 2017年08月27日 18:03, Christian König wrote:
Am 25.08.2017 um 21:19 schrieb Christian König:
Am 25.08.2017 um 18:22 schrieb Marek Olšák:
On Fri, Aug 25, 2017 at 3:00 PM, Christian König
<deathsim...@vodafone.de> wrote:
Am 25.08.2017 um 12:32 schrieb zhoucm1:

On 2017年08月25日 17:38, Christian König wrote:
From: Christian König <christian.koe...@amd.com>

Add the IOCTL interface so that applications can allocate per VM
BOs.

Still WIP since not all corner cases are tested yet, but this
reduces
average
CS overhead for 10K BOs from 21ms down to 48us.
Wow, cheers, eventually you get per vm bo to same reservation
with PD/pts,
indeed save a lot of bo list.
Don't cheer to loud yet, that is a completely constructed test case.

So far I wasn't able to archive any improvements with any real
game on this
with Mesa.
With thinking more, too many BOs share one reservation, which could
result in reservation lock often is busy, if eviction or destroy also
happens often in the meaning time, then which could effect VM update
and CS submission as well.
That's exactly the reason why I've added code to the BO destroy path to
avoid at least some of the problems. But yeah, that's only the tip of
the iceberg of problems with that approach.

Anyway, this is very good start and try that we reduce CS overhead,
especially we've seen "reduces average CS overhead for 10K BOs from
21ms down to 48us. ".
Actually, it's not that good. See this is a completely build up test
case on a kernel with lockdep and KASAN enabled.

In reality we usually don't have so many BOs and so far I wasn't able to
find much of an improvement in any real world testing.

Regards,
Christian.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to