On Sat, 2007-12-22 at 10:19 +0100, Thomas Hellström wrote: > There is indeed, although you and Eric have improved the situation > considerably. > I hope there will be some time available to document things more > properly soon.
I think it works fairly well for me to try and document things as I understand them; my only concern is that I will mis-document things when I don't understand them correctly. If you're willing to review documentation patches as I create them, that seems like it will work fine. > OK. > However, it's not really the CPU unmapping / mapping itself that is > costly, but the > caching state change of the global kernel page mapping. Yes, this is where we see a huge performance improvement with the Intel driver. I'm hoping this mechanism will become available for more drivers as people understand the cost of changing page cache policies. I'm also working with the Intel CPU designers to try and reduce this cost in our products. > While the Intel driver can currently work around this, the workaround > does create an illegal page aliasing (The same physical page mapped with > inconsistent caching attributes), which does upset some processors (Read > AMD). The aliasing only occurs through the graphics aperture, which doesn't seem like a significant issue for me, and one which we could 'fix' by simply not mapping this portion of the aperture. For TTM mapped pages, there is simply no good reason to ever access them through the graphics address space. > Best would be to keep the low-performance but correct option as > the default for other drivers. I think the key question is how we can ensure that cached memory writes are flushed before the GPU reads them. For discrete graphics cards, it's no different than any other DMA operation. For UMA graphics, we need to know how to synchronize memory access between the memory controller and graphics engine. We've got documentation for all of the Intel UMA controllers now, and have managed to get 915 and later chips working. I believe this makes it possible for us to never use WC writes to TTM pages, something which will provide a significant performance increase for all graphics environments. > Yup, that would be drm_bo_do_validate(). I think a possible > simplificatio would be to merge > drm_bo_handle_validate() into drm_bo_setstatus_ioctl(). It's not quite -- drm_bo_do_validate waits for the buffer to become unfenced, which I do not believe necessary, as objects are placed on the unfenced list only from execbuffer, which is atomic. It should error-out if the buffer is on the unfenced list. Which reminds me, I think this should be called the 'pending_fence' list as these buffers are to-be-fenced soon. Merging drm_bo_handle_validate into drm_bo_setstatus_ioctl would be great. Could we also merge drm_bo_do_validate into drm_buffer_object_validate? I'd like to call this function drm_bo_validate in keeping with the remaining function names. > That would probably be a solution that's better than the current sync flush. > The IMHO *best* solution would have been to have the hardware engineers > add an IRQ to the sync_flush mechanism, something that's actually > available on every other sync flush... Using the higher-priority queue is effectively the same as the existing sync flush mechanism, and in fact far better as the hardware sync flush requires that the queue be stopped while the flushing is going on. That can all wait while we restructure the driver to support MI_SET_CONTEXT next year. Meanwhile, simply placing an MI_FLUSH at each fence will suffice, and eliminate the polling complexity. > OK. I'll see if I can take a quick shot at a proper renaming / > documentation of these. A short description that suffices to explain it to me would be appreciated; I would then extend that into longer prose throughout the code. > In fact there are quite many hardware implementations of functions that > would benefit from an interrupt (SiS, Unichrome, Intel sync flush, > Poulsbo 2D queue space available..) but require hardware polling. I > agree that the fence flush implementaton is hairy and perhaps overly > complex, but I think we need to keep the polling functionality, and just > not use it in drivers that does not need polling. I don't mind making some drivers do polling, but at this point, the DRM_WAIT_ON macro is used in the generic code, and that cannot be switched to avoid polling. Perhaps the wait functionality should be moved into the driver so that those which require polling can poll while those which do not can use blocking mechanisms. > Hmm, If I understand the docs correctly it's not OK to evict a texture > unless a read flush has been performed, even if the batchbuffer trailing > breadcrumb has been written? This impacts texture thrashing situations > and unexpected client deaths. That's probably weasel words in the docs to allow for read-ahead in the textures. We've recently managed to get the Intel CPU architects to tighten their memory access documentation; perhaps we can ask the graphics architects to do the same. That might let us reduce the amount of flushing going on for eviction. Of course, I'm hoping that eviction becomes a fairly rare event -- if you've got enough memory to hold the textures, you should have enough GTT space to map them. The G33 aperture can be set to 2GB, which is generally a substantial fraction of the available physical memory on a machine. > Before implementing a lot of code to avoid the MI_FLUSH after every > batch buffer, are there any benchmarks available that really show that > it's a big gain? When I tested with the i915tex driver on i915s I saw no > such big improvements, but admittedly I tested only a limited set of > programs that didn't use textures, to avoid triggering the sync_flush > mechanism. Right, we haven't any credible performance data in this area. Right now, flushing the cache is fairly inexpensive, but caches will only get bigger, and the penalty for losing context higher. So, I'd like to make sure our architecture supports this, even if our current chips don't see a huge benefit. Thanks for taking time to respond. -- [EMAIL PROTECTED]
signature.asc
Description: This is a digitally signed message part
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel