Re: [Mesa-dev] [PATCH 0/8] Gallium & RadeonSI optimization for Ryzen CPUs

Axel Davy Thu, 06 Sep 2018 13:56:44 -0700

Yeah by pinning to cores, I meant to group of cores.

I think a reasonable policy would be for the kernel to put all threadsof a given process on the same L3

as long as the number of threads is lower than the L3 group size.

When there is more threads I guess it'd need heuristics to pick whichthreads to put together.

I fear if we begin to do the work manually, there won't be interest todo that in the kernel,and thus all applications will need to include such core pinning code tohave good performance when

multithreaded.

Axel

On 9/6/18 9:21 PM, Marek Olšák wrote:

Actually, you make a good point about the kernel, but the kernel has
no visibility into which threads need to be coupled together. So the
kernel can't do anything.

Marek

On Thu, Sep 6, 2018 at 2:24 PM, Marek Olšák <mar...@gmail.com> wrote:

I think you are missing the point. This series doesn't pin threads to
cores. It pins threads to one L3, which can have 4 or 8 cores.

Marek

On Thu, Sep 6, 2018 at 5:22 AM, Axel Davy <davyax...@gmail.com> wrote:

Hi Marek,

Shouldn't this core pinning be handled by the kernel ?

Else all multithreaded games (or applications) need an update.

I also see a risk in applications handling the core pinning: several
intensive applications
may pin the same cores. The kernel would be able to switch automatically
the pinned cores if load would be better shared among cores.

Yours,

Axel Davy


On 9/6/18 6:02 AM, Marek Olšák wrote:

Hi,

When the Ryzen CPUs were launched, they didn't perform very well in
games, and it took a while before games were patched. Guess what,
Mesa drivers have suffered from the same inefficincies until now.

The AMD Zen architecture has multiple core complexes (CCX) where each
CCX has e.g. 4C/8T and always one L3 cache. If application and driver
threads don't run on the same CCX, communication between threads is
slow, because multiple L3 caches must maintain coherency between them.
Atomic operations seem to suffer the most, almost as if they were
uncached. (are they?)

This series pins the application thread and all driver execution
threads to 1 L3 cache (1 CCX). If the application thread is already
pinned to a hw thread or core(s), all driver threads are pinned to
the same L3 cache (CCX) as the application thread.

Shader compiler threads are unpinned, as they are not critical.

The piglit/drawoverhead microbenchmark shows that this increases
performance by 32% for DrawElements and 25% for DrawArrays on Ryzen
1st-Gen CPUs. It will probably be much less with real apps.

Please review.

Thanks,
Marek
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/8] Gallium & RadeonSI optimization for Ryzen CPUs

Reply via email to