On 11/30/2012 05:30 PM, Jerome Glisse wrote:
On Fri, Nov 30, 2012 at 4:39 AM, Thomas Hellstrom <tho...@shipmail.org> wrote:
On 11/29/2012 10:58 PM, Marek Olšák wrote:

What I tried to point out was that the synchronization shouldn't be
needed, because the CPU shouldn't do anything with the contents of
evicted buffers. The GPU moves the buffers, not the CPU. What does the
CPU do besides updating some kernel structures?

Also, buffer deletion is something where you don't need to wait for
the buffer to become idle if you know the memory area won't be
mapped by the CPU, ever. The memory can be reclaimed right away. It
would be the GPU to move new data in and once that happens, the old
buffer will be trivially idle, because single-ring GPUs execute
commands in order.

Marek

Actually asynchronous eviction / deletion is something I have been
prototyping for a while but never gotten around to implement in TTM:

There are a few minor caveats:

With buffer deletion, what you say is true for fixed memory, but not for TT
memory where pages are reclaimed by the system after buffer destruction.
That means that we don't have to wait for idle to free GPU space, but we
need to wait before pages are handed back to the system.

Swapout needs to access the contents of evicted buffers, but synchronizing
doesn't need to happen until just before swapout.

Multi-ring - CPU support: If another ring / engine or the CPU is about to
move in buffer contents to VRAM or a GPU aperture that was previously
evicted by another ring, it needs to sync with that eviction, but doesn't
know what buffer or even which buffers occupied the space previously.
Trivially one can attach a sync object to the memory type manager that
represents the last eviction from that memory type, and *any* engine (CPU or
GPU) that moves buffer contents in needs to order that movement with respect
to that fence. As you say, with a single ring and no CPU fallbacks, that
ordering is a no-op, but any common (non-driver based) implementation needs
to support this.

A single fence attached to the memory type manager is the simplest solution,
but a solution with a fence for each free region in the free list is also
possible. Then TTM needs a driver callback to be able order fences w r t
echother.

/Thomas

Radeon already handle multi-ring and ttm interaction with what we call
semaphore. Semaphore are created to synchronize with fence accross
different ring. I think the easiest solution is to just remove the bo
wait in ttm and let driver handle this.

The wait can be removed, but only conditioned on a driver flag that says it supports unsynchronous buffer moves.

The multi-ring case I'm talking about is:

Ring 1 evicts buffer A, emits fence 0
Ring 2 evicts buffer B, emits fence 1
..Other eviction takes place by various rings, perhaps including ring 1 and ring 2. Ring 3 moves buffer C into the space which happens bo be the union of the space prevously occupied buffer A and buffer B.

Question is: which fence do you want to order this move with?
The answer is whichever of fence 0 and 1 signals last.

I think it's a reasonable thing for TTM to keep track of this, but in order to do so it needs a driver callback that can order two fences, and can order a job in the current ring w r t a fence. In radeon's case that driver callback would probably insert a barrier / semaphore. In the case of simpler hardware it would wait on one of the fences.

/Thomas


Cheers,
Jerome



_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to