Texture replacement policy and occlusion queries

2006-01-16 Thread Stephane Marchesin

Hi,

I was considering how complicated it can be to implement a texture 
replacement policy, and then I had the following idea : we could make 
use of hardware cocclusion queries on cards that support them to 
determine actual texture usage and thus have a good texture replacement 
policy. Here is a simplified view of how this could work :

- a usage counter is added to each texture
- each time a texture is bound, a query is started
- each time a texture is unbound, the counter is read back and added to 
the corresponding texture counter
- after a number of frames, we are able to compute the number of pixels 
actually contributed to by each texture, and thereby determine texture 
usage on the fly. Then, we can use this information to move textures 
to/from agp/video ram accordingly.


Given that occlusion queries are virtually free (at least when supported 
by the underlying hardware) I think this approach could work quite well. 
It is also possible to extend it to multitexturing without too much 
trouble. What do you think ?


Stephane



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


RFC, drm low-level infrastructure for GART dynamic memory

2006-01-16 Thread Thomas Hellström

Hi,

Following Keiths proposal on AGP memory manager design,
I have a first shot proposal of an interface that lets the memory 
manager bind and unbind drm maps into the GTT, while at the same time 
taking care of caching issues.


The key issues are

a) Is the drm able to keep track of _all_ vmas referencing a drm map, 
even after a fork(), that is, will a vma_open() be called on a fork()?


b) Is it sufficient to first mark the kernel linear map PTEs uncacheable 
and then mark the vma PTEs uncachable? Will this create a problem on 
certain processors during the short time the mappings conflict? The 
solution to this will probably be to define DRM_TTM_PARANOID in the code 
below, but I guess that the try_unmap_page() function will generate some 
significant overhead.


The interface proposal is in the attached file, comments are much 
appreciated.


/Thomas

8 --code snippet---

   if (!ttm-cache_adjusted  ttm-needs_cache_adjust) {
   for (cur_page = ttm-pages; cur_page  last_page; ++cur_page) {
#ifdef DRM_TTM_PARANOID   
   if (try_to_unmap_page(page) != SWAP_SUCCESS) {

   drm_unbind_ttm(ttm);
   return -1;
   }
#endif
   if (PageHighMem(page)  page_addr(cur_page) != NULL) {
   drm_unbind_ttm(ttm);
   return -1;
   }
   change_page_attr(cur_page,1,PAGE_KERNEL_NOCACHE);
   }
   for (pt = ttm-vmalist, prev = NULL; pt; prev = pt, pt = pt-next) {

   /*
* Stolen from memprotect.c. Assumes we can keep track of 
mapping vmas

* in a list for each ttm.
*/

   drm_change_protection(pt-vma, pt-vma-vm_start, 
pt-vma-vm_end,

 pgprot_noncached(pt-vma-vm_prot));
   pt-vma-vm_prot = pgprot_noncached(pt-vma-vm_prot);
   }   
   global_flush_tlb();

   }


Description of a drm_ttm API. 

Concept - A drm_ttm_t is conceptually a drm map consisting of a physically
dicontinous page range mappable into userspace and which can be bound
to a hardware-dependant translation table. Typically that translation
table is the GART, but other similar devices could be considered. 
(On-board PCIE translation tables ?). The limitation of the current
DRM agp map implementation is that the pages cannot be mapped into
userspace other than through the AGP aperture, and read access through
the AGP aperture is painfully slow.


In drm, a ttm is similar in concept to a normal drm shared memory map.

/*
 * Registers a drm ttm referenced by handle. Memory is allocated and locked. 
 * The handle is used by drmMap to map the ttm into userspace, using nopage() 
 * on a page-fault basis. DRM internally keeps track of the vmas mapping the 
ttm. 
 * Problem: What happens on fork? How can we register the copied vma? 
 * Called by the user through an IOCTL similar to drmAddMap(), or by the memory 
manager.
 *
 * drm_ttm_t *drm_create_ttm(drm_device_t *dev, unsigned long size, 
drm_handle_t *handle);
 */


/*
 * Binds the ttm into the translation table at offset offset. Used by the 
memory manager.
 *  
 * 1) Checks the backend (AGP driver) if we need to make memory uncacheable, if 
so there are two
 *solutions:
 *a) We know that we keep track of all vmas (see Problem above regarding 
copied vmas). We
 *   run a scheme similar to memprotect to mark all PTEs on all vmas 
mapping this ttm as
 *   uncacheable. We also mark the vmas and the kernel mapping for each 
page uncacheable.
 *   This is the preferred method.
 *b) We don't keep track of vmas resulting from a fork. We need to unmap 
each page in the ttm
 *   from all its mappings using try_unmap_page(). 
 *   We then mark all vmas calling nopage() uncacheable and
 *   set the correct caching policy when pages are remapped. The kernel 
mapping for each page is
 *   marked uncaheble.
 * 2) Calls backend bind function. (AGPBind)
 */

int drm_bind_ttm(drm_device_t *dev, drm_ttm_t *ttm, unsigned long offset);

/*
 * Unbinds the ttm. Restores previous caching policy using either method a) or 
b) above.
 * If the ttm is already evicted, only the caching policy is restored.
 */

int drm_unbind_ttm(drm_device_t *dev, drm_ttm_t *ttm);

/*
 * Unbinds the ttm. Keeps the backend caching policy. Used by the Aperture 
memory manager.
 */

int drm_evict_ttm(drm_device_t *dev, drm_ttm_t *ttm);

/*
 * Rebinds a previously evicted ttm. Used by the Aperture memory manager.
 */

int drm_rebind_ttm(drm_device_t *dev, drm_ttm_t *ttm, unsigned long offset);

/*
 * Unbinds a ttm if it is bound, Then unregisters the ttm. The vmas continue to 
live their life,
 * but any page fault will create a segmentation violation.
 */

int drm_destroy_ttm(drm_device_t *dev, drm_ttm_t *ttm);



Re: Texture replacement policy and occlusion queries

2006-01-16 Thread Roland Scheidegger

Stephane Marchesin wrote:

Hi,

I was considering how complicated it can be to implement a texture 
replacement policy, and then I had the following idea : we could make 
use of hardware cocclusion queries on cards that support them to 
determine actual texture usage and thus have a good texture replacement 
policy. Here is a simplified view of how this could work :

- a usage counter is added to each texture
- each time a texture is bound, a query is started
- each time a texture is unbound, the counter is read back and added to 
the corresponding texture counter
- after a number of frames, we are able to compute the number of pixels 
actually contributed to by each texture, and thereby determine texture 
usage on the fly. Then, we can use this information to move textures 
to/from agp/video ram accordingly.


Given that occlusion queries are virtually free (at least when supported 
by the underlying hardware) I think this approach could work quite well. 
It is also possible to extend it to multitexturing without too much 
trouble. What do you think ?
An interesting idea. I have some doubts though it is worthwile to 
implement this (though occlusion queries on their own certainly seem 
useful). One reason I think it might not be too useful is because gart 
texturing is not _that_ slow usually, so using feedback might be 
overkill. Also, just because you know how many pixels are affected by a 
texture doesn't really tell you how much memory bandwidth for texture 
reads is necessary, it only gives you a rough idea (since there are 
texture caches, and also the texture associated with an object might be 
huge but reads only occur for a small mipmap etc.).
I think some mechanism not using any feedback (i.e. just counting how 
many times a texture is used per frame and taking texture size into 
account) would be quite acceptable for now - and in a world where xaa 
wouldn't steal half your local video memory for pixmaps memory pressure 
should be quite a bit lower too.
And while occlusion queries might be virtually free in terms of hardware 
resources, the code might not look pretty.
I think at least newer gpus should probably have even more appropriate 
performance counters if you really want to use a feedback mechanism.


Roland


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel