Re: [PATCH] Clean up and document drm_ttm.c APIs. drm_bind_ttm -> drm_ttm_bind.

Keith Packard Sat, 22 Dec 2007 11:06:34 -0800

On Sat, 2007-12-22 at 10:19 +0100, Thomas Hellström wrote:

> There is indeed, although you and Eric have improved the situation 
> considerably.
> I hope there will be some time available to document things more 
> properly soon.


I think it works fairly well for me to try and document things as I
understand them; my only concern is that I will mis-document things when
I don't understand them correctly. If you're willing to review
documentation patches as I create them, that seems like it will work
fine.

> OK.
> However, it's not really the CPU unmapping / mapping itself that is 
> costly, but the
> caching state change of the global kernel page mapping.

Yes, this is where we see a huge performance improvement with the Intel
driver. I'm hoping this mechanism will become available for more drivers
as people understand the cost of changing page cache policies. I'm also
working with the Intel CPU designers to try and reduce this cost in our
products.

> While the Intel driver can currently work around this, the workaround 
> does create an illegal page aliasing (The same physical page mapped with 
> inconsistent caching attributes), which does upset some processors (Read 
> AMD).

The aliasing only occurs through the graphics aperture, which doesn't
seem like a significant issue for me, and one which we could 'fix' by
simply not mapping this portion of the aperture. For TTM mapped pages,
there is simply no good reason to ever access them through the graphics
address space.

>  Best would be to keep the low-performance but correct option as 
> the default for other drivers.

I think the key question is how we can ensure that cached memory writes
are flushed before the GPU reads them. For discrete graphics cards, it's
no different than any other DMA operation. For UMA graphics, we need to
know how to synchronize memory access between the memory controller and
graphics engine. We've got documentation for all of the Intel UMA
controllers now, and have managed to get 915 and later chips working.

I believe this makes it possible for us to never use WC writes to TTM
pages, something which will provide a significant performance increase
for all graphics environments.

> Yup, that would be drm_bo_do_validate(). I think a possible 
> simplificatio would be to merge
> drm_bo_handle_validate() into drm_bo_setstatus_ioctl().

It's not quite -- drm_bo_do_validate waits for the buffer to become
unfenced, which I do not believe necessary, as objects are placed on the
unfenced list only from execbuffer, which is atomic. It should error-out
if the buffer is on the unfenced list. Which reminds me, I think this
should be called the 'pending_fence' list as these buffers are
to-be-fenced soon.

Merging drm_bo_handle_validate into drm_bo_setstatus_ioctl would be
great. Could we also merge drm_bo_do_validate into
drm_buffer_object_validate? I'd like to call this function
drm_bo_validate in keeping with the remaining function names.

> That would probably be a solution that's better than the current sync flush.
> The IMHO *best* solution would have been to have the hardware engineers 
> add an IRQ to the sync_flush mechanism, something that's actually 
> available on every other sync flush...

Using the higher-priority queue is effectively the same as the existing
sync flush mechanism, and in fact far better as the hardware sync flush
requires that the queue be stopped while the flushing is going on. That
can all wait while we restructure the driver to support MI_SET_CONTEXT
next year. Meanwhile, simply placing an MI_FLUSH at each fence will
suffice, and eliminate the polling complexity.

> OK. I'll see if I can take a quick shot at a proper renaming / 
> documentation of these.

A short description that suffices to explain it to me would be
appreciated; I would then extend that into longer prose throughout the
code.

> In fact there are quite many hardware implementations of functions that 
> would benefit from an interrupt (SiS, Unichrome, Intel sync flush, 
> Poulsbo 2D queue space available..) but require hardware polling.  I 
> agree that the fence flush implementaton is hairy and perhaps overly 
> complex, but I think we need to keep the polling functionality, and just 
> not use it in drivers that does not need polling.

I don't mind making some drivers do polling, but at this point, the
DRM_WAIT_ON macro is used in the generic code, and that cannot be
switched to avoid polling.

Perhaps the wait functionality should be moved into the driver so that
those which require polling can poll while those which do not can use
blocking mechanisms.

> Hmm, If I understand the docs correctly it's not OK to evict a texture 
> unless a read flush has been performed, even if the batchbuffer trailing 
> breadcrumb has been written? This impacts texture thrashing situations 
> and unexpected client deaths.

That's probably weasel words in the docs to allow for read-ahead in the
textures. We've recently managed to get the Intel CPU architects to
tighten their memory access documentation; perhaps we can ask the
graphics architects to do the same. That might let us reduce the amount
of flushing going on for eviction. Of course, I'm hoping that eviction
becomes a fairly rare event -- if you've got enough memory to hold the
textures, you should have enough GTT space to map them. The G33 aperture
can be set to 2GB, which is generally a substantial fraction of the
available physical memory on a machine.

> Before implementing a lot of code to avoid the MI_FLUSH after every 
> batch buffer, are there any benchmarks available that really show that 
> it's a big gain? When I tested with the i915tex driver on i915s I saw no 
> such big improvements, but admittedly I tested only a limited set of 
> programs that didn't use textures, to avoid triggering the sync_flush 
> mechanism.

Right, we haven't any credible performance data in this area. Right now,
flushing the cache is fairly inexpensive, but caches will only get
bigger, and the penalty for losing context higher. So, I'd like to make
sure our architecture supports this, even if our current chips don't see
a huge benefit.

Thanks for taking time to respond.

-- 
[EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] Clean up and document drm_ttm.c APIs. drm_bind_ttm -> drm_ttm_bind.

Reply via email to