Re: Plan: BO move throttling for visible VRAM evictions

zhoucm1 Mon, 27 Mar 2017 02:38:32 -0700


On 2017年03月27日 17:29, Christian König wrote:

On APUs I've already enabled using direct access to the stolen partsof system memory.

Thanks, could you point me out where is doing this?

Regards,
David Zhou


So there won't be any eviction any more because of page faults on APUs.

Regards,
Christian.

Am 27.03.2017 um 09:53 schrieb Zhou, David(ChunMing):

For APU special case, can we prevent eviction happening between VRAM<----> GTT?


Regards,
David Zhou

-----Original Message-----

From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] OnBehalf Of Michel D?nzer

Sent: Monday, March 27, 2017 3:36 PM
To: Marek Olšák <mar...@gmail.com>
Cc: amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
Subject: Re: Plan: BO move throttling for visible VRAM evictions

On 25/03/17 01:33 AM, Marek Olšák wrote:

Hi,

I'm sharing this idea here, because it's something that has been
decreasing our performance a lot recently, for example:
http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc1
09d1c3dc27e871c8aea71ca13f23fa

I think the problem there is that Mesa git started uploading
descriptors and uniforms to VRAM, which helps when TC L2 has a low
hit/miss ratio, but the performance can randomly drop by an order of
magnitude. I've heard rumours that kernel 4.11 has an improved
allocator that should perform better, but the situation is still far
from ideal.

AMD CPUs and APUs will hopefully suffer less, because we can resize
the visible VRAM with the help of our CPU hw specs, but Intel CPUs
will remain limited to 256 MB. The following plan describes how to do
throttling for visible VRAM evictions.


1) Theory

Initially, the driver doesn't care about where buffers are in VRAM,
because VRAM buffers are only moved to visible VRAM on CPU page faults
(when the CPU touches the buffer memory but the memory is in the
invisible part of VRAM). When it happens,
amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
also marks the buffer as contiguous, which makes memory fragmentation
worse.

I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
was much higher in a CPU profiler than anything else in the kernel.


2) Monitoring via Gallium HUD

We need to expose 2 kernel counters via the INFO ioctl and display
those via Gallium HUD:
- The number of VRAM CPU page faults. (the number of calls to
amdgpu_bo_fault_reserve_notify).
- The number of bytes moved by ttm_bo_validate inside
amdgpu_bo_fault_reserve_notify.

This will help us observe what exactly is happening and fine-tune the
throttling when it's done.


3) Solution

a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
(amdgpu_bo::had_cpu_page_fault = true)

b) Monitor the MB/s rate at which buffers are moved by
amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
don't move the buffer to visible VRAM. Move it to GTT instead. Note
that moving to GTT can be cheaper, because moving to visible VRAM is
likely to evict a lot of buffers there and unmap them from the CPU,

FWIW, this can be avoided by only setting GTT in busy_placement. ThenTTM will only move the BO to visible VRAM if that can be done withoutevicting anything from there.

but moving to GTT shouldn't evict or unmap anything.

c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
it can be moved to VRAM if:
- the GTT->VRAM move rate is low enough to allow it (this is the
existing throttling mechanism)
- the visible VRAM move rate is low enough that we will be OK with
another CPU page fault if it happens.

Some other ideas that might be worth trying:

Evicting BOs to GTT instead of moving them to CPU accessible VRAM inprinciple in some cases (e.g. for all BOs except those with

AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) or even always.

Implementing eviction from CPU visible to CPU invisible VRAM, similarto how it's done in radeon. Note that there's potential for userspacetriggering an infinite loop in the kernel in cases where BOs aremoved back from invisible to visible VRAM on page faults.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Plan: BO move throttling for visible VRAM evictions

Reply via email to