Re: intel hw and caching interface to TTM..

Keith Whitwell Tue, 30 Oct 2007 02:10:16 -0800

Dave Airlie wrote:
>> Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean cache-coherent
>> memory, that is cache coherent also while visible to the GPU. There are HW
>> implementations out there (Poulsbo at least) where this option actually seems
>> to work, althought it's considerably slower for things like texturing. It's
>> also a requirement for user bo's since they will have VMAs that we cant kill
>> and remap.
> 
> Most PCIE cards will be cache coherent, however AGP cards not so much, so 
> need to think if a generic _CACHED makes sense especially for something 
> like radeon, will I have to pass different flags depending on the GART 
> type.... this seems like uggh.. so maybe a separate flag makes more 
> sense..
> 
>> Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean
>> DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This will
>> also indicate that the buffer cannot be used for user-space sub-allocators, 
>> as
>> we in that case must be able to guarantee that the CPU can access parts of 
>> the
>> buffer while other parts are validated for the GPU.
> 
> Yes, to be honest sub-allocators for most use-cases should be avoided if 
> possible, we should be able to make the kernel interface fast enough for 
> most things if we don't have to switching caching flags on the fly at 
> map/destroy etc..


Hmm - if that was true why do we have malloc() and friends - aren't they 
just sub-allocators for brk() and mmap()?

There is more to this than performance - applications out there can 
allocate extraordinarily large numbers of small textures, that can only 
sanely be dealt with as light-weight userspace suballocations of a 
sensible-sized buffer.  (We don't do this yet, but will need to at some 
point!).  The reasons for this are granularity (ie wasted space in the 
allocation), the memory overhead of managing all these allocations, and 
perhaps third performance.

If you think about what goes on in a 3d driver, you are always doing 
sub-allocations of some sort or another, though that's more obvious when 
you start doing state objects that have an independent lifecycle as 
opposed to just emitting state linearly into a command buffer.  For 
managing objects of a few dozen bytes, obviously you are going to want 
to do that in userspace.

So there is a continuum where successively larger buffers increasingly 
justify whatever additional cost there is to go directly to the kernel 
to allocate them.  But for sufficiently small or frequently allocated 
buffers, there will always be a crossover point where it is faster to do 
it in userspace.

It certainly makes sense to speed up the kernel paths, but that won't 
make the crossover point go away - it'll just shift it more or less 
depending on how successful you are.

Keith



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: intel hw and caching interface to TTM..

Reply via email to