Re: TTM merging?

Eric Anholt Wed, 14 May 2008 10:49:17 -0700

On Wed, 2008-05-14 at 10:21 -0700, Keith Whitwell wrote:
> 
> ----- Original Message ----
> > From: Jerome Glisse <[EMAIL PROTECTED]>
> > To: Thomas Hellström <[EMAIL PROTECTED]>
> > Cc: Dave Airlie <[EMAIL PROTECTED]>; Keith Packard <[EMAIL PROTECTED]>; DRI 
> > <dri-devel@lists.sourceforge.net>; Dave Airlie <[EMAIL PROTECTED]>
> > Sent: Wednesday, May 14, 2008 6:08:55 PM
> > Subject: Re: TTM merging?
> > 
> > On Wed, 14 May 2008 16:36:54 +0200
> > Thomas Hellström wrote:
> > 
> > > Jerome Glisse wrote:
> > > I don't agree with you here. EXA is much faster for small composite 
> > > operations and even small fill blits if fallbacks are used. Even to 
> > > write-combined memory, but that of course depends on the hardware. This 
> > > is going to be even more pronounced with acceleration architectures like 
> > > Glucose and similar, that don't have an optimized path for small 
> > > hardware composite operations.
> > > 
> > > My personal feeling is that pwrites are a workaround for a workaround 
> > > for a very bad decision:
> > > 
> > > To avoid user-space allocators on device-mapped memory. This lead to a 
> > > hack to avoid cahing-policy changes which lead to  cache trashing 
> > > problems which put us in the current situation.  How far are we going to 
> > > follow this path before people wake up? What's wrong with the 
> > > performance of good old i915tex which even beats "classic" i915 in many 
> > > cases.
> > > 
> > > Having to go through potentially (and even probably) paged-out memory to 
> > > access buffers to make that are present in VRAM sounds like a very odd 
> > > approach (to say the least) to me. Even if it's a single page and 
> > > implementing per-page dirty checks for domain flushing isn't very 
> > > appealing either.
> > 
> > I don't have number or benchmark to check how fast pread/pwrite path might
> > be in this use so i am just expressing my feeling which happen to just be
> > to avoid vma tlb flush as most as we can. I got the feeling that kernel
> > goes through numerous trick to avoid tlb flushing for a good reason and
> > also i am pretty sure that with number of core keeping growing anythings
> > that need cpu broad synchronization is to be avoided.
> > 
> > Hopefully once i got decent amount of time to do benchmark with gem i will
> > check out my theory. I think simple benchmark can be done on intel hw just
> > return false in EXA prepare access to force use of download from screen,
> > and in download from screen use pread then comparing benchmark of this
> > hacked intel ddx with a normal one should already give some numbers.
> > 
> > > Why should we have to when we can do it right?
> > 
> > Well my point was that mapping vram is not right, i am not saying that
> > i know the truth. It's just a feeling based on my experiment with ttm
> > and on the bar restriction stuff and others consideration of same kind.
> > 
> > > No. Gem can't coop with it. Let's say you have a 512M system with two 1G 
> > > video cards, 4G swap space, and you want to fill both card's videoram 
> > > with render-and-forget textures for whatever purpose.
> > > 
> > > What happens? After you've generated the first say 300M, The system 
> > > mysteriously starts to page, and when, after a a couple of minutes of 
> > > crawling texture upload speeds, you're done, The system is using and 
> > > have written almost 2G of swap. Now, you want to update the textures and 
> > > expect fast texsubimage...
> > > 
> > > So having a backing object that you have to access to get things into 
> > > VRAM is not the way to go.
> > > The correct way to do this is to reserve, but not use swap space. Then 
> > > you can start using it on suspend, provided that the swapping system is 
> > > still up (which is has to be with the current GEM approach anyway). If 
> > > pwrite is used in this case, it must not dirty any backing object pages.
> > > 
> > 
> > For normal desktop i don't expect VRAM amount > RAM amount, people with
> > 1Go VRAM are usually hard gamer with 4G of ram :). Also most object in
> > 3d world are stored in memory, if program are not stupid and trust gl
> > to keep their texture then you just have the usual ram copy and possibly
> > a vram copy, so i don't see any waste in the normal use case. Of course
> > we can always come up with crazy weird setup, but i am more interested
> > in dealing well with average Joe than dealing mostly well with every
> > use case.
> 
> It's always been a big win to go to single-copy texturing.  Textures
> tend to be large and nobody has so much memory that doubling up on
> textures has ever been appealing...  And there are obvious use-cases
> like textured video where only having a single copy is a big
> performance.


So upload it with pwrite.  Have your driver implementation of pwrite
make some VRAM space, copy it directly in, and mark it as needing to be
synced to backing store if evicted.  You haven't even loaded the pages
of the backing store in, so you haven't allocated that memory.  I'm not
a big fan of this because it seems to leave nasty problems with scaring
up enough memory when you go to suspend/evict, but I'm not the person
writing your driver so it's not my decision.

-- 
Eric Anholt                             [EMAIL PROTECTED]
[EMAIL PROTECTED]                         [EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM merging?

Reply via email to