Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 20:14:00 -0800, Eric Anholt <[EMAIL PROTECTED]> wrote:
> Is there evidence that this is/would be in fact faster?

That's how the networking drivers work and they may be the fastest
drivers in the system.
But, it has not been coded for AGP so nobody knows for sure. It has to
be faster though, having the CPU do the copy will cause the TLB cache
to be flushed as you walk through all of the pages. Having the GPU do
the copy is even worse since it moves across AGP.

We have bigger problems to chase. Plus implementing it this way
probably has a bunch of architecture specific problems I don't know
about. But I'm sure it would work on the x86.

After we get X on GL up on mesa-solo I can look at changing the
texture copy code.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Dave Airlie
> > it. Instead mark it's page table entries as copy on write. Get the
> > physical address of the page and set it into the GART. Now the GPU can
> > get to it with zero copies. When you are done with it, check and see
> > if the app caused a copy on write, if so free the page, else just
> > remove the COW flag.
>
> Is there evidence that this is/would be in fact faster?

no but I could practically guarantee anything is faster than the 3-4
copies a radeon texture goes through at the moment..

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person



---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Eric Anholt
On Thu, 2005-02-10 at 22:23 -0500, Jon Smirl wrote:
> On Thu, 10 Feb 2005 21:59:29 -0500, Owen Taylor <[EMAIL PROTECTED]> wrote:
> > That should allow a straight-copy from data you create to memory card
> > the can texture from, which is about as good as possible.
> 
> If you have a big AGP aperture to play with there is a faster way.
> When you get the call to copy the texture from user space, don't copy
> it. Instead mark it's page table entries as copy on write. Get the
> physical address of the page and set it into the GART. Now the GPU can
> get to it with zero copies. When you are done with it, check and see
> if the app caused a copy on write, if so free the page, else just
> remove the COW flag.

Is there evidence that this is/would be in fact faster?

-- 
Eric Anholt[EMAIL PROTECTED]  
http://people.freebsd.org/~anholt/ [EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 21:59:29 -0500, Owen Taylor <[EMAIL PROTECTED]> wrote:
> That should allow a straight-copy from data you create to memory card
> the can texture from, which is about as good as possible.

If you have a big AGP aperture to play with there is a faster way.
When you get the call to copy the texture from user space, don't copy
it. Instead mark it's page table entries as copy on write. Get the
physical address of the page and set it into the GART. Now the GPU can
get to it with zero copies. When you are done with it, check and see
if the app caused a copy on write, if so free the page, else just
remove the COW flag.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Owen Taylor
Dave Airlie wrote:
A better scheme for a movie player would be to create a single texture
and then keep replacing it's contents. Or use two textures and double
buffer. But once created these textures would not move in the LRU list
unless you started something like a game in another window.

if we supported that in any reasonable fashion (at least on radeon/r200),
movie players are very texture upload bound, well at least on my embedded
system, I do a lot of animation with movies, and mngs and arrays of pngs,
and most of my time is spent in memcpy and texstore_rgba, this is a
real pain for me, and I'm slowly gathering enough knowledge to do a great
big hack for my own internal use,
Perhaps a wild idea ... does APPLE_client_texture do what you want? If 
so then it might be a lot simpler and more reusable to 
test/optimize/fixup that then to start from scratch.

That should allow a straight-copy from data you create to memory card 
the can texture from, which is about as good as possible.

For subimage modification the spec seems to permit modifying the data in 
place then calling TexSubImage on the subregion with a pointer into
the original data to notify of the change.

Regards,
Owen
---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Roland Scheidegger
Felix Kühling wrote:
I don't think my algorithm is much more complicated. It can be
implemented by gradual improvements of the current algorithm (freeing
stale texture memory is one step) which helps avoiding unexpected
performance regressions. At the moment I'm not planning to rewrite it
from scratch, especially because I can't test on any hardware where I
can actually measure great performance improvements ATM.
I'm not sure what a really good implementation would look like, but you 
could try lowering gart speed to 1x with a savage to see a performance 
difference between local and gart texturing. Though I'm not convinced 
the savages are actually fast enough to even take a hit with agp 1x...

Roland
---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Roland Scheidegger
Jon Smirl wrote:
AGP 8x should just be able to keep up with 1280x1024x24b 60 
times/sec.
AGP 4x should be enough. Remember I got 600MB/s max throughput. Not with
24bit textures though, the Mesa RGBA-BGRA conversion takes WAY too much
time to achieve that.
How does mesa access AGP memory from the CPU side? AGP memory is 
system memory which the AGP makes visible to the GPU.  Are we using 
the GPU to load textures into AGP memory or is it being done entirely
 on the main CPU with a memcopy?
depends on driver. radeon/r200 use gpu blit. Might be suboptimal but at
least it handles things like tiling (when the gpu blitter can do it) 
automatically. I'm not sure but couldn't the radeon blitter actually do 
rgba-bgra conversion too for instance?

For things like a movie player we should even be able to give it a 
pointer to the texture in system memory(AGP space) and let it 
directly manipulate the texture buffer. Doing that would require 
playing with the page tables to preserve protection.
This seems exactly to be what the client extension of the r200 driver is 
intended for. But for normal apps, it's useless (and for the most part 
even for apps which could make good use of it, since it's an extension 
almost noone uses anyway).

Roland


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
AGP 8x should just be able to keep up with 1280x1024x24b 60 times/sec.

How does mesa access AGP memory from the CPU side? AGP memory is
system memory which the AGP makes visible to the GPU.  Are we using
the GPU to load textures into AGP memory or is it being done entirely
on the main CPU with a memcopy?

For things like a movie player we should even be able to give it a
pointer to the texture in system memory(AGP space) and let it directly
manipulate the texture buffer. Doing that would require playing with
the page tables to preserve protection.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Dave Airlie
>
> A better scheme for a movie player would be to create a single texture
> and then keep replacing it's contents. Or use two textures and double
> buffer. But once created these textures would not move in the LRU list
> unless you started something like a game in another window.

if we supported that in any reasonable fashion (at least on radeon/r200),
movie players are very texture upload bound, well at least on my embedded
system, I do a lot of animation with movies, and mngs and arrays of pngs,
and most of my time is spent in memcpy and texstore_rgba, this is a
real pain for me, and I'm slowly gathering enough knowledge to do a great
big hack for my own internal use,

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person



---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Roland Scheidegger
Felix Kühling wrote:
I simplified this idea a little further and attached a patch against
texmem.[ch]. It frees stale textures (and also place holders for other
clients' textures) that havn't been used in 1 second when it runs out of
space on a texture heap. This way it will try a bit harder to put
textures into the first heap before using the second heap, without much
risk (I hope) of performance regressions.
I tested this on a ProSavageDDR where rendering speed appears to be the
same with local and GART textures. There was no measurable performance
regression in Quake3 and I noticed no subjective performance regression
in Torcs or Quake1 either.
Now the only thing missing in texmem.c for migrating textures from GART
to local memory would be a flag to driAllocateTexture to stop trying if
kicking stale textures didn't free up enough space (on the first texture
heap).
Anyway, I think the attached patch should already make a difference as
it is. I'd be interested how much it improves your performance numbers
with Quake3 and rtcw on r200 when both texture heaps are enabled.
I've done a couple of benchmarks. All results are "fglrx-boosted", so to 
speak (too lazy to reboot).

q3, local 45MB or 35MB:  145 fps
rtcw, local 45MB: 95 fps
rtcw, local 35MB: 76 fps
with both heaps, local size 35MB, GART texture size 61MB:
q3, old allocator:   105-125 fps
rtcw, old allocator:   70-84 fps
q3, new allocator:   108-126 fps
rtcw, new allocator:   71-85 fps
This does not seem to really make a difference.
One interesting thing I noticed though is that it is actually not really 
a "range" of results, but only some distinct values. For rtcw, the 
scores were always very close to either 70, 77 or 85 fps (within 1 
frame), out of 10 runs maybe 6 were around 77, 2 around 70 and 2 around 
85. Quake3 mostly ran at around 125 fps but once every while was just 
below 110.

Roland
---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Felix Kühling
Am Donnerstag, den 10.02.2005, 17:40 -0500 schrieb Jon Smirl: 
> On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling <[EMAIL PROTECTED]> wrote:
> > This scheme would give good results with movie players that need fast
> > texture uploads and typically use each texture exactly once. It would
> 
> Movie players aren't even close to being texture bandwidth bound. The

That's not my experience. Optimizations in the texture upload path,
using the AGP heap and partial texture uploads had a big impact on
mplayer -vo gl performance on my ProSavageDDR (factor 2-3 all of them
taken together).

> demote from local to AGP scheme would cause two copies on each frame
> but there is plenty of bandwidth. But this assumes that the movie
> player creates a new texture for each frame.
> 
> A better scheme for a movie player would be to create a single texture
> and then keep replacing it's contents.

You're right, that's what actually happens in mplayer. It uses
glTexSubImage2D because it typically changes only a part of a texture
with power-of-two dimensions.

> Or use two textures and double
> buffer. But once created these textures would not move in the LRU list
> unless you started something like a game in another window.

Yes, they would move in the LRU list. That's why it's called "least
recently used" not "least recently created". ;-)

So I would have to modify my scheme to reset the usage count/frequency
when a texture image is changed, such that a texture that is updated
very frequently would not be promoted to local memory.

Am Donnerstag, den 10.02.2005, 17:34 -0500 schrieb Jon Smirl:
> On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling <[EMAIL PROTECTED]> wrote:
> > This means you copy a texture when you don't know if or when you're
> > going to need it again. So the move of the texture may just be a waste
> > of time. It would be better to just kick the texture and upload it again
> > later when it's really needed.
> 
> I suspect this extra texture copy wouldn't be noticable except when
> you construct a test program which articifically triggers it. Most
> games will achieve a steady state with their loaded textures after a
> frame or two and the copies will stop.

Still this copy is unnecessary at the time. Delaying the re-upload to
the time when the texture is needed again has only advantages and is not
difficult to implement.

> 
> > I'd rather reverse your scheme. Upload a texture to the GART heap first,
> > because that's potentially faster (though not with the current
> > implementation in the radeon drivers). When the texture is needed more
> > frequently, try promoting it to the local texture heap.
> 
> I thought about this, but there is no automatic way to figure out when
> to promote from GART to local.

Yes there is. In the current scheme, whenever a texture is bound to a
hardware tex unit the driver calls driUpdateTexLRU, which moves the
texture to the front of the LRU list. In this function you could easily
count how often or how frequently a texture has been used. Based on this
information and maybe the texture size you could decide which textures
to promote and when. You will keep promoting textures until the local
heap is full of non-stale textures.

>  Same problem when local overflows, what
> do you demote to AGP? You still have copies with this scheme too.

Textures are sorted in LRU-order on the texture heaps. So you always
kick least recently used textures first. It has always worked like this
even in the current scheme. For promoting textures I would only kick
stale textures from the local heap.

> 
> Going first to local and then demoting to AGP sorts everything
> automatically. It may cause a little more churn in the heaps,

In my experience texture uploads are quite expensive. So IMO avoiding
unnecessary texture uploads or copies should have a high priority.

>  but the
> advantage is that the algorithm is very simple and doesn't need much
> tuning. The only tunable parameter is determining when the top of the
> AGP heap is "hot" and booting it. You could use something simple like
> boot after 500 accesses.

I don't think my algorithm is much more complicated. It can be
implemented by gradual improvements of the current algorithm (freeing
stale texture memory is one step) which helps avoiding unexpected
performance regressions. At the moment I'm not planning to rewrite it
from scratch, especially because I can't test on any hardware where I
can actually measure great performance improvements ATM.

The only tunable parameter in my algorithm is how often/frequently used
a texture must be in order to try to promote it to the local texture
heap. Maybe there are a few more degrees of freedom, because you can
also consider the texture size for promotion. I think the steady state
result would be about the same as with your algorithm, but I expect my
scheme to work better when textures are used very infrequently or
updated very frequently (movie players). In particular this would make
the texture_h

Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling <[EMAIL PROTECTED]> wrote:
> This scheme would give good results with movie players that need fast
> texture uploads and typically use each texture exactly once. It would

Movie players aren't even close to being texture bandwidth bound. The
demote from local to AGP scheme would cause two copies on each frame
but there is plenty of bandwidth. But this assumes that the movie
player creates a new texture for each frame.

A better scheme for a movie player would be to create a single texture
and then keep replacing it's contents. Or use two textures and double
buffer. But once created these textures would not move in the LRU list
unless you started something like a game in another window.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling <[EMAIL PROTECTED]> wrote:
> This means you copy a texture when you don't know if or when you're
> going to need it again. So the move of the texture may just be a waste
> of time. It would be better to just kick the texture and upload it again
> later when it's really needed.

I suspect this extra texture copy wouldn't be noticable except when
you construct a test program which articifically triggers it. Most
games will achieve a steady state with their loaded textures after a
frame or two and the copies will stop.

> I'd rather reverse your scheme. Upload a texture to the GART heap first,
> because that's potentially faster (though not with the current
> implementation in the radeon drivers). When the texture is needed more
> frequently, try promoting it to the local texture heap.

I thought about this, but there is no automatic way to figure out when
to promote from GART to local. Same problem when local overflows, what
do you demote to AGP? You still have copies with this scheme too.

Going first to local and then demoting to AGP sorts everything
automatically. It may cause a little more churn in the heaps, but the
advantage is that the algorithm is very simple and doesn't need much
tuning. The only tunable parameter is determining when the top of the
AGP heap is "hot" and booting it. You could use something simple like
boot after 500 accesses.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Felix Kühling
Am Donnerstag, den 10.02.2005, 15:31 -0500 schrieb Jon Smirl:
> I haven't looked at the texture heap management code, but one simple
> idea for heap management would be to cascade the on-board heap to the
> AGP one. How does the current algorithm work? Does an algorithm like
> the one below have merit? It should sort the hot textures on-board,
> and single use textures should fall out of the cache.
> 
> 1) load all textures initially in the on-board heap. Since if you are
> loading them you're probably going to use them.

Drivers usually upload textures to the hardware just before binding them
to a hardware texture unit. So this assumption is always true.

> 2) Do LRU with the on-board heap. 
> 3) When you run out of space on-board, demote the end of the LRU list
> to the top of the AGP heap and copy the texture between heaps.

This means you copy a texture when you don't know if or when you're
going to need it again. So the move of the texture may just be a waste
of time. It would be better to just kick the texture and upload it again
later when it's really needed.

> 4) Run LRU on the AGP heap.
> 5) When it runs out of space lose the item.
> 6) an added twist would be if the top of the AGP heap gets hit too
> often knock it out of cache so that it will get reloaded on-board.

I'd rather reverse your scheme. Upload a texture to the GART heap first,
because that's potentially faster (though not with the current
implementation in the radeon drivers). When the texture is needed more
frequently, try promoting it to the local texture heap.

This scheme would give good results with movie players that need fast
texture uploads and typically use each texture exactly once. It would
also improve performance with games, simulations, ... that tend to use
the same textures many times and benefit from the higher memory
bandwidth when accessing local textures.

> 
> 
> Jon Smirl
> [EMAIL PROTECTED]
> 

-- 
| Felix Kühling <[EMAIL PROTECTED]> http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3  B152 151C 5CC1 D888 E595 |



---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
I haven't looked at the texture heap management code, but one simple
idea for heap management would be to cascade the on-board heap to the
AGP one. How does the current algorithm work? Does an algorithm like
the one below have merit? It should sort the hot textures on-board,
and single use textures should fall out of the cache.

1) load all textures initially in the on-board heap. Since if you are
loading them you're probably going to use them.
2) Do LRU with the on-board heap. 
3) When you run out of space on-board, demote the end of the LRU list
to the top of the AGP heap and copy the texture between heaps.
4) Run LRU on the AGP heap.
5) When it runs out of space lose the item.
6) an added twist would be if the top of the AGP heap gets hit too
often knock it out of cache so that it will get reloaded on-board.


Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Felix Kühling
Am Mittwoch, den 09.02.2005, 22:12 +0100 schrieb Felix Kühling: 
> Am Mittwoch, den 09.02.2005, 20:58 +0100 schrieb Roland Scheidegger:
[snip] 
> > Performance with gart texturing, even in 4x mode, takes a big hit 
> > (almost 50%).
> > I was not really able to get consistent performance results when both 
> > texture heaps were active, I guess it's luck of the day which textures 
> > got put in the gart heap and which ones in the local heap. But that 
> > performance indeed got faster with a smaller gart heap is not a good 
> > sign. And even if the maximum obtained in rtcw with 35MB local heap and 
> > 29MB gart heap was higher than the score obtained with 35MB local heap 
> > alone, there were clearly areas which ran faster with only the local heap.
> > It seems to me that the allocator really should try harder to use the 
> > local heap to be useful on r200 cards, moreover it is likely that you'd 
> > get quite a bit better performance when you DO have to put textures into 
> > the gart heap when you revisit that later when more space becomes 
> > available on the local heap and upload the still-used textures from the 
> > gart heap to the local heap (in fact, should be even faster than those 
> > 650MB/s, since no in-kernel-copy would be needed, it should be possible 
> > to blit it directly).
> 
> The big problem with the current texture allocator is that it can't tell
> which areas are really unused. Texture space is only allocated and never
> freed. Once the memory is "full" it starts kicking textures to upload
> new ones. This is the only way of "freeing" memory. Using an LRU
> strategy it has a good chance of kicking unused textures first, but
> there's no guarantee. It can't tell if a kicked texture will be needed
> the next instant. So trying to move textures from GART to local memory
> would basically mean that you blindly kick the least recently used
> texture(s) from local memory. If those textures are needed again soon
> then performance is going to suffer badly.
> 
> Therefore I'm proposing a modified allocator that fails when it needs to
> start kicking too recently used textures (e.g. textures used in the
> current or previous frame). Failure would not be fatal in this case, you
> just keep the texture in GART memory and try again later. Actually you
> could use the same allocator for normal texture uploads. Just specify
> the current texture heap age as the limit.
> 
> If you try to move textures back to local memory each time a texture is
> used, this would result in some kind of automatic regulation of heap
> usage. By kicking only textures that are several frames old in this
> process, you'd avoid trashing.
> 
> Currently the texture heap age is only incremented on lock contention
> (IIRC). In this scheme you'd also increment it on buffer swaps and
> remember the texture heap ages of the last two buffer swaps.

I simplified this idea a little further and attached a patch against
texmem.[ch]. It frees stale textures (and also place holders for other
clients' textures) that havn't been used in 1 second when it runs out of
space on a texture heap. This way it will try a bit harder to put
textures into the first heap before using the second heap, without much
risk (I hope) of performance regressions.

I tested this on a ProSavageDDR where rendering speed appears to be the
same with local and GART textures. There was no measurable performance
regression in Quake3 and I noticed no subjective performance regression
in Torcs or Quake1 either.

Now the only thing missing in texmem.c for migrating textures from GART
to local memory would be a flag to driAllocateTexture to stop trying if
kicking stale textures didn't free up enough space (on the first texture
heap).

Anyway, I think the attached patch should already make a difference as
it is. I'd be interested how much it improves your performance numbers
with Quake3 and rtcw on r200 when both texture heaps are enabled.

> 
[snip]

Regards,
  Felix

-- 
| Felix Kühling <[EMAIL PROTECTED]> http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3  B152 151C 5CC1 D888 E595 |
--- ./texmem.h.~1.6.~	2005-02-02 17:20:40.0 +0100
+++ ./texmem.h	2005-02-10 17:44:40.0 +0100
@@ -101,6 +101,11 @@
 	 * value must be greater than
 	 * or equal to \c firstLevel.
 	 */
+
+	double  clockAge;		/**< Clock time stamp indicating when
+	 * the texture was last used. The unit
+	 * is seconds.
+	 */
 };
 
 
--- ./texmem.c.~1.10.~	2005-02-05 14:16:25.0 +0100
+++ ./texmem.c	2005-02-10 18:39:15.0 +0100
@@ -50,6 +50,7 @@
 #include "texformat.h"
 
 #include 
+#include 
 
 
 
@@ -243,6 +244,13 @@
*/
 
   move_to_head( & heap->texture_objects, t );
+  {
+	 struct timeval tv;
+	 if ( gettimeofday( &tv, NULL ) == 0 ) {
+	t->clockAge = (double)tv.tv_sec + (double)tv.tv_usec / 1e6;
+	 } else
+	t->clockAge = 0.0;
+  }
 
 
   for (i = start ; i <= end ; i++) {
@@ -