Re: TTM merging?

Keith Whitwell Wed, 14 May 2008 10:28:11 -0700


----- Original Message ----
> From: Jerome Glisse <[EMAIL PROTECTED]>
> To: Thomas Hellström <[EMAIL PROTECTED]>
> Cc: Dave Airlie <[EMAIL PROTECTED]>; Keith Packard <[EMAIL PROTECTED]>; DRI 
> <dri-devel@lists.sourceforge.net>; Dave Airlie <[EMAIL PROTECTED]>
> Sent: Wednesday, May 14, 2008 6:08:55 PM
> Subject: Re: TTM merging?
> 
> On Wed, 14 May 2008 16:36:54 +0200
> Thomas Hellström wrote:
> 
> > Jerome Glisse wrote:
> > I don't agree with you here. EXA is much faster for small composite 
> > operations and even small fill blits if fallbacks are used. Even to 
> > write-combined memory, but that of course depends on the hardware. This 
> > is going to be even more pronounced with acceleration architectures like 
> > Glucose and similar, that don't have an optimized path for small 
> > hardware composite operations.
> > 
> > My personal feeling is that pwrites are a workaround for a workaround 
> > for a very bad decision:
> > 
> > To avoid user-space allocators on device-mapped memory. This lead to a 
> > hack to avoid cahing-policy changes which lead to  cache trashing 
> > problems which put us in the current situation.  How far are we going to 
> > follow this path before people wake up? What's wrong with the 
> > performance of good old i915tex which even beats "classic" i915 in many 
> > cases.
> > 
> > Having to go through potentially (and even probably) paged-out memory to 
> > access buffers to make that are present in VRAM sounds like a very odd 
> > approach (to say the least) to me. Even if it's a single page and 
> > implementing per-page dirty checks for domain flushing isn't very 
> > appealing either.
> 
> I don't have number or benchmark to check how fast pread/pwrite path might
> be in this use so i am just expressing my feeling which happen to just be
> to avoid vma tlb flush as most as we can. I got the feeling that kernel
> goes through numerous trick to avoid tlb flushing for a good reason and
> also i am pretty sure that with number of core keeping growing anythings
> that need cpu broad synchronization is to be avoided.
> 
> Hopefully once i got decent amount of time to do benchmark with gem i will
> check out my theory. I think simple benchmark can be done on intel hw just
> return false in EXA prepare access to force use of download from screen,
> and in download from screen use pread then comparing benchmark of this
> hacked intel ddx with a normal one should already give some numbers.
> 
> > Why should we have to when we can do it right?
> 
> Well my point was that mapping vram is not right, i am not saying that
> i know the truth. It's just a feeling based on my experiment with ttm
> and on the bar restriction stuff and others consideration of same kind.
> 
> > No. Gem can't coop with it. Let's say you have a 512M system with two 1G 
> > video cards, 4G swap space, and you want to fill both card's videoram 
> > with render-and-forget textures for whatever purpose.
> > 
> > What happens? After you've generated the first say 300M, The system 
> > mysteriously starts to page, and when, after a a couple of minutes of 
> > crawling texture upload speeds, you're done, The system is using and 
> > have written almost 2G of swap. Now, you want to update the textures and 
> > expect fast texsubimage...
> > 
> > So having a backing object that you have to access to get things into 
> > VRAM is not the way to go.
> > The correct way to do this is to reserve, but not use swap space. Then 
> > you can start using it on suspend, provided that the swapping system is 
> > still up (which is has to be with the current GEM approach anyway). If 
> > pwrite is used in this case, it must not dirty any backing object pages.
> > 
> 
> For normal desktop i don't expect VRAM amount > RAM amount, people with
> 1Go VRAM are usually hard gamer with 4G of ram :). Also most object in
> 3d world are stored in memory, if program are not stupid and trust gl
> to keep their texture then you just have the usual ram copy and possibly
> a vram copy, so i don't see any waste in the normal use case. Of course
> we can always come up with crazy weird setup, but i am more interested
> in dealing well with average Joe than dealing mostly well with every
> use case.


It's always been a big win to go to single-copy texturing.  Textures tend to be 
large and nobody has so much memory that doubling up on textures has ever been 
appealing...  And there are obvious use-cases like textured video where only 
having a single copy is a big performance.

It certainly makes things easier for the driver to duplicate textures -- which 
is why all the old DRI drivers did it -- but it doesn't make it right...  And 
the old DRI drivers also copped out on things like render-to-texture, etc, so 
whatever gains you make in simplicity by treating VRAM as a cache, some of 
those will be lost because you'll have to keep track of which one of the two 
copies of a texture is up-to-date, and you'll still have to preserve (modified) 
texture contents on eviction, which old DRI never had to.

Ultimately it boils down to a choice between making your life easier as a 
developer of the driver and producing a driver that makes most advantage of all 
the system resources.  

Nobody can force you to take one path or the other, but it's certainly my 
intention when considering drivers for VRAM hardware to support 
single-copy-number textures, and for that reason, I'd be unhappy to see a 
system adopted that prevented that.

Keith


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM merging?

Reply via email to