Dave Airlie wrote:

>Hi,
>
>So currently the TTM interface allows the user specify a cacheable 
>allocation, and on Intel hardware this gets conflated with using the intel 
>snooped memory type in the GART. This is bad as the intel snooped memory 
>type comes with its own set of special rules and sucks for lots of things.
>
>However I want to be able to use the cacheable CPU memory with the GPU to 
>a) make things go fast
>b) avoid lots of SMP cross-talking and cache flushing (see a)
>c) buffer object creation faster
>  
>
Cool.

>So this led me to the patch at:
>http://cgit.freedesktop.org/~airlied/drm/diff/?h=intel-hackery&id=da14a0bbb8849cdc91ca87786fde90ac36fe1198
>
>I could add back the snooped option if we want using a driver private 
>flag.
>  
>
Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean 
cache-coherent memory, that is cache coherent also while visible to the 
GPU. There are HW implementations out there (Poulsbo at least) where 
this option actually seems to work, althought it's considerably slower 
for things like texturing. It's also a requirement for user bo's since 
they will have VMAs that we cant kill and remap.

Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean 
DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This 
will also indicate that the buffer cannot be used for user-space 
sub-allocators, as we in that case must be able to guarantee that the 
CPU can access parts of the buffer while other parts are validated for 
the GPU.

>This patch evicts the buffer on mapping so the GPU doesn't see anything 
>via the aperture or otherwise, and flushes before validating into the 
>aperture. It doesn't contain the chipset flush patch yet which is requried 
>to actually make it work (add agp_chipset_flush to i915_dma.c before 
>submitting the batchbuffer)
>  
>
OK.

>This works and appears to be nice and fast, all userspace buffers can be 
>allocated _LOCAL | _CACHED and validated to _TT later without any major 
>cache flushing overhead when we have clflush, and without SMP overhead at 
>all as cache flushing is cache coherent on the Intel chipsets I've played 
>with so far (CPU coherent- not GPU)
>  
>
Does this mean that clflush() on one processor flushes the cache line on 
all processors in an SMP system? No need for preemption guarding and IPIs?

>I of course need to makes this code not so x86 specific, so I might add a 
>page flush hook to the driver interface and put the flushing code in the 
>driver side.
>
>This also leads me into backwards compatibility, the chipset flushing 
>changes to AGP are required for all of this good stuff, options are 
>1) resurrect linux-agp-compat add chipset flushing code - easier
>2) try and hack chipset flushing into drm_compat.c - probably more 
>difficult that I would like to bother with..
>
>So comments please on whether a comeback for linux-agp-compat is a good or 
>bad thing..
>
>  
>
Perhaps an intel-specific TTM backend?

>Dave.
>
>
>  
>
/Thomas



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to