Am Mittwoch, den 09.02.2005, 22:12 +0100 schrieb Felix Kühling: > Am Mittwoch, den 09.02.2005, 20:58 +0100 schrieb Roland Scheidegger: [snip] > > Performance with gart texturing, even in 4x mode, takes a big hit > > (almost 50%). > > I was not really able to get consistent performance results when both > > texture heaps were active, I guess it's luck of the day which textures > > got put in the gart heap and which ones in the local heap. But that > > performance indeed got faster with a smaller gart heap is not a good > > sign. And even if the maximum obtained in rtcw with 35MB local heap and > > 29MB gart heap was higher than the score obtained with 35MB local heap > > alone, there were clearly areas which ran faster with only the local heap. > > It seems to me that the allocator really should try harder to use the > > local heap to be useful on r200 cards, moreover it is likely that you'd > > get quite a bit better performance when you DO have to put textures into > > the gart heap when you revisit that later when more space becomes > > available on the local heap and upload the still-used textures from the > > gart heap to the local heap (in fact, should be even faster than those > > 650MB/s, since no in-kernel-copy would be needed, it should be possible > > to blit it directly). > > The big problem with the current texture allocator is that it can't tell > which areas are really unused. Texture space is only allocated and never > freed. Once the memory is "full" it starts kicking textures to upload > new ones. This is the only way of "freeing" memory. Using an LRU > strategy it has a good chance of kicking unused textures first, but > there's no guarantee. It can't tell if a kicked texture will be needed > the next instant. So trying to move textures from GART to local memory > would basically mean that you blindly kick the least recently used > texture(s) from local memory. If those textures are needed again soon > then performance is going to suffer badly. > > Therefore I'm proposing a modified allocator that fails when it needs to > start kicking too recently used textures (e.g. textures used in the > current or previous frame). Failure would not be fatal in this case, you > just keep the texture in GART memory and try again later. Actually you > could use the same allocator for normal texture uploads. Just specify > the current texture heap age as the limit. > > If you try to move textures back to local memory each time a texture is > used, this would result in some kind of automatic regulation of heap > usage. By kicking only textures that are several frames old in this > process, you'd avoid trashing. > > Currently the texture heap age is only incremented on lock contention > (IIRC). In this scheme you'd also increment it on buffer swaps and > remember the texture heap ages of the last two buffer swaps.
I simplified this idea a little further and attached a patch against texmem.[ch]. It frees stale textures (and also place holders for other clients' textures) that havn't been used in 1 second when it runs out of space on a texture heap. This way it will try a bit harder to put textures into the first heap before using the second heap, without much risk (I hope) of performance regressions. I tested this on a ProSavageDDR where rendering speed appears to be the same with local and GART textures. There was no measurable performance regression in Quake3 and I noticed no subjective performance regression in Torcs or Quake1 either. Now the only thing missing in texmem.c for migrating textures from GART to local memory would be a flag to driAllocateTexture to stop trying if kicking stale textures didn't free up enough space (on the first texture heap). Anyway, I think the attached patch should already make a difference as it is. I'd be interested how much it improves your performance numbers with Quake3 and rtcw on r200 when both texture heaps are enabled. > [snip] Regards, Felix -- | Felix Kühling <[EMAIL PROTECTED]> http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 |
--- ./texmem.h.~1.6.~ 2005-02-02 17:20:40.000000000 +0100 +++ ./texmem.h 2005-02-10 17:44:40.000000000 +0100 @@ -101,6 +101,11 @@ * value must be greater than * or equal to \c firstLevel. */ + + double clockAge; /**< Clock time stamp indicating when + * the texture was last used. The unit + * is seconds. + */ }; --- ./texmem.c.~1.10.~ 2005-02-05 14:16:25.000000000 +0100 +++ ./texmem.c 2005-02-10 18:39:15.000000000 +0100 @@ -50,6 +50,7 @@ #include "texformat.h" #include <assert.h> +#include <sys/time.h> @@ -243,6 +244,13 @@ */ move_to_head( & heap->texture_objects, t ); + { + struct timeval tv; + if ( gettimeofday( &tv, NULL ) == 0 ) { + t->clockAge = (double)tv.tv_sec + (double)tv.tv_usec / 1e6; + } else + t->clockAge = 0.0; + } for (i = start ; i <= end ; i++) { @@ -415,6 +423,15 @@ t->heap = heap; if (in_use) t->bound = 99; + + { + struct timeval tv; + if ( gettimeofday( &tv, NULL ) == 0 ) { + t->clockAge = (double)tv.tv_sec + (double)tv.tv_usec / 1e6; + } else + t->clockAge = 0.0; + } + insert_at_head( & heap->texture_objects, t ); } } @@ -477,6 +494,50 @@ +/** + * Free stale textures + * + * \param heap The heap from which to kick stale textures + * \param seconds Kick textures unused for this many seconds + */ + +static void +driFreeStaleTextures( driTexHeap * heap, double seconds ) +{ + driTextureObject * temp; + driTextureObject * cursor; + struct timeval tv; + double curTime; + if ( gettimeofday( &tv, NULL ) != 0 ) + return; + curTime = (double)tv.tv_sec + (double)tv.tv_usec / 1e6; + + if ( heap == NULL ) + return; + + for ( cursor = heap->texture_objects.prev, temp = cursor->prev; + cursor != &heap->texture_objects ; + cursor = temp, temp = cursor->prev ) { + + /* only consider our own textures that are not currently bound */ + if ( cursor->bound || !cursor->tObj ) { + continue; + } + + if ( curTime - cursor->clockAge > seconds ) { + driSwapOutTextureObject( cursor ); + } + /* Since textures are LRU sorted, it should be safe to terminate + * this loop once the first texture is kept. */ + else { + break; + } + } +} + + + + #define INDEX_ARRAY_SIZE 6 /* I'm not aware of driver with more than 2 heaps */ /** @@ -514,7 +575,7 @@ /* Run through each of the existing heaps and try to allocate a buffer - * to hold the texture. + * to hold the texture. If this fails, free stale textures and try again. */ for ( id = 0 ; (t->memBlock == NULL) && (id < nr_heaps) ; id++ ) { @@ -522,6 +583,11 @@ if ( heap != NULL ) { t->memBlock = mmAllocMem( heap->memory_heap, t->totalSize, heap->alignmentShift, 0 ); + if ( t->memBlock == NULL ) { + driFreeStaleTextures( heap, 1.0 ); + t->memBlock = mmAllocMem( heap->memory_heap, t->totalSize, + heap->alignmentShift, 0 ); + } } }