Keith Packard wrote:

>On Fri, 2007-12-21 at 13:21 +0100, Thomas Hellström wrote:
>
>  
>
>>Keith,
>>This is purely an optimization.
>>Buffers that are evicted due to a memory aperture space shortage ends up 
>>in a
>>state where they are unbound from the aperture, but retain the uncached 
>>state assuming the next operation on them will be "validate".
>>    
>>
>
>Ok, so we have several primitive operations here and several useful
>combinations. I think we just need some more consistent naming and
>documentation to reflect which function performs which primitive
>operations, and what the intended use cases are. Right now, the function
>names are not descriptive to me, and there is certainly a lack of
>documentation in the code.
>  
>
There is indeed, although you and Eric have improved the situation 
considerably.
I hope there will be some time available to document things more 
properly soon.

>  
>
>>In that case they are just bound back. However, if they are instead 
>>accessed by the cpu, the caching state needs to be fixed up. Hence the 
>>name. The caching fixup is, as you know, a very costly operation.
>>    
>>
>
>Now that we know how to flush the GMCH GWB, we can leave pages mapped to
>the CPU at all times, so this turns out to be 'free' if we know when
>objects have been accessed, which validation flags will tell us. So, my
>expectation is that pages would never be unmapped from the CPU just to
>be mapped to the GPU for the Intel driver at least. Not to say that
>these operations shouldn't be made available, just that they aren't
>useful for this driver.
>  
>
OK.
However, it's not really the CPU unmapping / mapping itself that is 
costly, but the
caching state change of the global kernel page mapping.
While the Intel driver can currently work around this, the workaround 
does create an illegal page aliasing (The same physical page mapped with 
inconsistent caching attributes), which does upset some processors (Read 
AMD). Best would be to keep the low-performance but correct option as 
the default for other drivers.

>It might be useful to clarify the ttm state names (in particular, it
>seems like ttm_unbound and ttm_evicted are reversed), then we could
>create functions like 'drm_ttm_unbind' which would move a buffer from
>bound to unbound, and drm_ttm_evict which would move a buffer from bound
>or unbound to evicted. That might lessen the unexposed ordering
>dependencies between the various functions. For instance, calling
>drm_ttm_fixup_caching to unbind a buffer won't work unless that buffer
>is alread evicted.
>  
>
Agreed.

>  
>
>>drm_bo_handle_validate() needed to be split for the superioctl, and 
>>drm_bo_do_validate() was created. It's possible this can be simplified.
>>    
>>
>
>Ok, with superioctl, what I think we want is drm_bo_validate which does:
>
>       error out if the buffer is unfenced
>       lock bo mutex
>       whack flags
>       move the buffer (waiting, if necessary)
>       unlock bo mutex
>  
>
Yup, that would be drm_bo_do_validate(). I think a possible 
simplificatio would be to merge
drm_bo_handle_validate() into drm_bo_setstatus_ioctl().

>  
>
>>1) Intel read/write sync flushes must be issued after the EXE TYPE flag 
>>has signaled, so we flush the correct operation. This was implemented 
>>and optimized for the case where the kernel needed to evict a texture 
>>(read flush) and didn't want to wait for the flush operation to pass 
>>down the command queue, effectively causing a command queue flush and 3D 
>>engine idle. As discussed before we ended up not using the sync flush 
>>code...
>>    
>>
>
>My key concern with this code at present is that it requires polling,
>where the CPU wakes up frequently and bangs on the hardware. This is
>costly in CPU time and power, and also delays application execution
>whenever a flush is necessary as the polling code does go briefly idle
>between operations.
>
>I'm ok with removing it then; the sync flush code would require that the
>CPU poll the graphics engine, something I'm really uninterested in
>seeing happen. As an alternative, we can use a separate command queue at
>higher priority to flush the GPU, and that can actually give us an
>interrupt.
>  
>
That would probably be a solution that's better than the current sync flush.
The IMHO *best* solution would have been to have the hardware engineers 
add an IRQ to the sync_flush mechanism, something that's actually 
available on every other sync flush...

However since we're not using the sync flush anywhere either I'm 
completely ok with having it removed.

>The portion of the code which has me completely puzzled is the use of
>the flush_mask and submitted_flush fields. They get modified and used
>without any obvious sense. If we could eliminate those and just use the
>fence->type (which should be fence->types_waiting) and fence->signaled
>(which should be fence->types_signaled), the code would be far more
>understandable. 
>
>  
>
OK. I'll see if I can take a quick shot at a proper renaming / 
documentation of these.

>>2) Unichrome type hardware: To guarante for example HighQualityVideo 
>>engine idle, the EXE type fence must have signaled, and  the engine must 
>>signal HQV engine idle using a hardware flag. ..
>>    
>>
>
>Is there no interrupt available here? If not, then some kind of polling
>is clearly required, but I'd much rather that happen in the Unichrome
>driver itself, rather than complicating the general DRM driver
>structure.
>
>  
>
In fact there are quite many hardware implementations of functions that 
would benefit from an interrupt (SiS, Unichrome, Intel sync flush, 
Poulsbo 2D queue space available..) but require hardware polling.  I 
agree that the fence flush implementaton is hairy and perhaps overly 
complex, but I think we need to keep the polling functionality, and just 
not use it in drivers that does not need polling.

>>So if the caller for example wants to wait for a Intel read type fence 
>>to signal, the code must first make sure the EXE flag has signaled, then 
>>and only then it indicates to the device dependant code that a flush is 
>>needed, and either sleep-waits or polls the device dependant code for 
>>flushing to complete
>>    
>>
>
>read fencing only requires that we have fences after each operation.
>It's the write fencing which is expensive as that requires an MI_FLUSH,
>which we'd love to remove from the end of every batch buffer at some
>point. However, as described above, we can place the MI_FLUSH in a
>separate command stream and run that in the middle of other existing
>commands, allowing us to avoid the polling mode of the current code.
>
>  
>
Hmm, If I understand the docs correctly it's not OK to evict a texture 
unless a read flush has been performed, even if the batchbuffer trailing 
breadcrumb has been written? This impacts texture thrashing situations 
and unexpected client deaths.

Before implementing a lot of code to avoid the MI_FLUSH after every 
batch buffer, are there any benchmarks available that really show that 
it's a big gain? When I tested with the i915tex driver on i915s I saw no 
such big improvements, but admittedly I tested only a limited set of 
programs that didn't use textures, to avoid triggering the sync_flush 
mechanism.

/Thomas


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to