Re: TTM vs GEM discussion questions

Keith Whitwell Mon, 19 May 2008 04:32:54 -0700


----- Original Message ----
> From: Dave Airlie <[EMAIL PROTECTED]>
> To: Ian Romanick <[EMAIL PROTECTED]>
> Cc: DRI <dri-devel@lists.sourceforge.net>
> Sent: Monday, May 19, 2008 4:38:02 AM
> Subject: Re: TTM vs GEM discussion questions
> 
> 
> > 
> > All the good that's done us and our users.  After more than *5 years* of
> > various memory manager efforts we can't support basic OpenGL 1.0 (yes,
> > 1.0) functionality in a performant manner (i.e., glCopyTexImage and
> > friends).  We have to get over this "it has to be perfect or it will
> > never get in" crap.  Our 3D drivers are entirely irrelevant at this point.
> 
> Except on Intel hardware, who's relevance may or may not be relevant.


These can't do copyteximage with the in-kernel drm.  


> > To say that "userspace APIs cannot die once released" is not a relevant
> > counterpoint.  We're not talking about a userspace API for general
> > application use.  This isn't futexs, sysfs, or anything that
> > applications will directly depend upon.  This is an interface between a
> > kernel portion of a driver and a usermode portion of a driver.  If we
> > can't be allowed to change or deprecate those interfaces, we have no hope.
> > 
> > Note that the closed source guys don't have this artificial handicap.
> 
> Ian, fine you can take this up with Linus and Andrew Morton, I'm not 
> making this up just to stop you from putting 50 unsupportable memory 
> managers in the kernel. If you define any interface to userspace from the 
> kernel (ioctls, syscalls), you cannot just make it go away. The rule is 
> simple and is that if you install a distro with a kernel 2.6.x.distro, and 
> it has Mesa 7.0 drivers on it, upgrading the kernel to kernel 2.6.x+n 
> without touching userspace shouldn't break userspace ever. If we can't 
> follow this rule we can't put out code into Linus's kernel. So don't argue 
> about it, deal with it, this isn't going to change.
> 
> and yes I've heard this crap about closed source guys, but we can't follow 
> their route and be distributed by vendors. How many vendors ship the 
> closed drivers?
> 
> > This is also a completely orthogonal issue to maintaining any particular
> > driver.  Drivers are removed from the kernel just the same as they are
> > removed from X.org.  Assume we upstreamed either TTM or GEM "today."
> > Clearly that memory manager would continue to exist as long as some
> > other driver continued to depend on it.  I don't see how this is
> > different from cfb or any of the other interfaces within the X server
> > that we've gutted recently.
> 
> Drivers and pieces of the kernel aren't removed like you think. I think we 
> nuked gamma (didn't have a working userspcae anymore) and ffb (it sucked 
> and couldn't  be fixed). Someone is bound to bring up OSS->ALSA, but that 
> doesn't count as ALSA had OSS emulation layer so userspace apps didn't 
> just stop working. Removing chunks of X is vastly different to removing an 
> exposed kernel userspace interface. Please talk to any IBM kernel person 
> and clarify how this stuff works. (Maybe benh could chime in...??)
> 
> > If you want to remove a piece of infrastructure, you have three choices.
> > ~ If nothing uses it, you gut it.  If something uses it, you either fix
> > that "something" to use different infrastructure (which puts you in the
> > "nothing uses it" state) or you leave things as they are.  In spite of
> > all the fussing the kernel guys do in this respect, the kernel isn't
> > different in this respect from any other large, complex piece of
> > infrastructure.
> 
> So you are going to go around and fix the userspaces on machines that are 
> already deployed? how? e.g. Andrew Morton has a Fedora Core 1 install on a 
> laptop booting 2.6.x-mm kernels, when 3D stops working on that laptop we 
> get to hear about it. So yes you can redesign and move around the kernel 
> internals as much as you like, but you damn well better expose the old 
> interface and keep it working.
> 
> > managers or that we may want to have N memory managers now that will be
> > gutted later.  It seems that the real problem is that the memory
> > managers have been exposed as a generic, directly usable, device
> > independent piece of infrastructure.  Maybe the right answer is to punt
> > on the entire concept of a general memory manager.  At best we'll have
> > some shared, optional use infrastructure, and all of the interfaces that
> > anything in userspace can ever see are driver dependent.  That limits
> > the exposure of the interfaces and lets us solve todays problems today.
> > 
> > As is trivially apparent, we don't know what the "best" (for whatever
> > definition of best we choose) answer is for a memory manager interface.
> > ~ We're probably not going to know that answer in the near future.  To
> > not let our users have anything until we can give them the best thing is
> > an incredible disservice to them, and it makes us look silly (at best).
> > 
> 
> Well the thing is I can't believe we don't know enough to do this in some 
> way generically, but maybe the TTM vs GEM thing proves its not possible. 

I don't think there's anything particularly wrong with the GEM interface -- I 
just need to know that the implementation can be fixed so that performance 
doesn't suck as hard as it does in the current one, and that people's political 
views on basic operations like mapping buffers don't get in the way of writing 
a decent driver.

We've run a few benchmarks against i915 drivers in all their permutations, and 
to summarize the results look like:
    - for GPU-bound apps, there are small differences, perhaps up to 10%.  I'm 
really not concerned about these (yet).
    - for CPU-bound apps, the overheads introduced by Intel's approach to 
buffer handling impose a significant penalty in the region of 50-100%.

I think the latter is the significant result -- none of these experiments in 
memory management significantly change the command stream the hardware has to 
operate on, so what we're varying essentially is the CPU behaviour to acheive 
that command stream.  And it is in CPU usage where GEM (and Keith/Eric's 
now-abandoned TTM driver) do significantly dissapoint.

Or to put it another way, GEM & master/TTM seem to burn huge
amounts of CPU just running the memory manager.  This isn't true for 
master/no-ttm or for i915tex using userspace sub-allocation, where the CPU 
penalty for getting decent memory management seems to be minimal relative to 
the non-ttm baseline.  

If there's a political desire to not use userspace sub-allocation, then 
whatever kernel-based approach you want to investigate should nonetheless make 
some effort to hit reasonable performance goals -- and neither of the current 
two kernel-allocation-based approaches currently are at all impressive.

Keith


==============================================================
And on an i945G, dual core Pentium D 3Ghz 2MB cache, FSB 800 Mhz, 
single-channel ram:


Openarena timedemo at 640x480:
--------------------------------------------
master w/o TTM:  840 frames, 17.1 seconds: 49.0 fps, 12.24s user 1.02s system 
63% cpu 20.880 total
master with TTM: 840 frames, 15.8 seconds: 53.1 fps, 13.51s user 5.15s system 
95% cpu 19.571 total
i915tex_branch:  840 frames, 13.8 seconds: 61.0 fps, 12.54s user 2.34s system 
85% cpu 17.506 total
gem:             840 frames, 15.9 seconds: 52.8 fps, 11.96s user 4.44s system 
83% cpu 19.695 total

KW:  It's less obvious here than some of the tests below, but the pattern is 
still clear -- compared to master/no-ttm i915tex is getting about the same 
ratio of fps to CPU usage, whereas both master/ttm and gem are significantly 
worse, burning much more CPU per fps, with a large chunk of the extra CPU being 
spent in the kernel.  

The particularly worrying thing about GEM is that it isn't hitting *either* 
100% cpu *or* maximum framerates from the hardware -- that's really not very 
good, as it implies hardware is being left idle unecessarily.


glxgears:

A: ~1029 fps, 20.63user 2.88system 1:00.00elapsed 39%CPU  (master, no ttm) 
B: ~1072 fps, 23.97user 18.06system 1:00.00elapsed 70%CPU  (master, ttm)
C: ~1128 fps, 22.38user 5.21system 1:00.00elapsed 45%CPU  (i915tex, new)
D: ~1167 fps, 23.14user 9.07system 1:00.00elapsed 53%CPU  (i915tex, old)
F: ~1112 fps, 24.70user 21.95system 1:00.00elapsed 77%CPU  (gem)

KW: The high CPU overhead imposed by GEM and (non-suballocating) master/TTM 
should be pretty clear here.  master/TTM burns 30% of CPU just running the 
memory manager!!  GEM gets slightly higher framerates but uses even more CPU 
than master/TTM.  

fgl_glxgears -fbo:

A: n/a
B: ~244 fps, 7.03user 5.30system 1:00.01elapsed 20%CPU  (master, ttm)
C: ~255 fps, 6.24user 1.71system 1:00.00elapsed 13%CPU  (i915tex, new)
D: ~260 fps, 6.60user 2.44system 1:00.00elapsed 15%CPU  (i915tex, old)
F: ~258 fps, 7.56user 6.44system 1:00.00elapsed 23%CPU  (gem)

KW: GEM & master/ttm burn more cpu to build/submit the same command streams.

openarena 1280x1024:

A: 840 frames, 44.5 seconds: 18.9 fps  (master, no ttm)
B: 840 frames, 40.8 seconds: 20.6 fps  (master, ttm)
C: 840 frames, 40.4 seconds: 20.8 fps  (i915tex, new)
D: 840 frames, 37.9 seconds: 22.2 fps  (i915tex, old)
F: 840 frames, 40.3 seconds: 20.8 fps  (gem)

KW:  no cpu measurements taken here, but almost certainly GPU bound.  A lot of 
similar numbers, I don't believe the deltas have anything in particular to do 
with memory management interface choices...

ipers:

A: ~285000 Poly/sec (master, no ttm)
B: ~217000 Poly/sec (master, ttm)
C: ~298000 Poly/sec (i915tex, new)
D: ~227000 Poly/sec (i915tex, old)
F: ~125000 Poly/sec (gen, GPU lockup on first attempt)

KW: no cpu measurements in this run, but all are almost certainly 100% pinned 
on CPU.  
  - i915tex (in particular i915tex, new) show similar performance to classic - 
ie low cpu overhead for this memory manager.
  - GEM is significantly worse even than master/ttm -- hopefully this is a bug 
rather than a necessary characteristic of the interface.

texdown:

A: total texels=393216000.000000  time=3.004000 (master, no ttm)
B: total texels=434110464.000000  time=3.000000 (master, ttm)
C: (i915tex new --- woops, crashes)  
D: total texels=1111490560.000000  time=3.002000 (i915tex old)
F: total texels=279969792.000000  time=3.004000 (gem)

Note the huge (3x-4x) performance lead of i915tex, despite the embarassing 
crash in the newer version.  I suspect this is unrelated to command handling 
and probably somebody has disabled or regressed some aspect of the texture 
upload path...  

NOTE:  The reason that i915tex does so well relative to master/no-ttm is 
because we can upload directly to "VRAM"...  master/no-ttm treats vram as a 
cache & always keeps a second copy of the texture safe in main memory...  Hence 
performance isn't great for texture uploads on master/no-ttm.


Here's what we're seeing on a i915 3GHz Celeron 256kB cache. Dual channel. 
Reportdamage 
disabled. DRM master:

=======================================================================

*Test*
    *i915tex_branch*
    *i915 master, TTM*
    *i915 master, classic*
    ( no gem results on this machine ... )
    
gears
    1033fps, 70.1% CPU.  (i915tex)
    726fps, 100% CPU. (master, ttm)
    955fps, 56%CPU. (master, no-ttm)
openarena
    47,1fps, 17.9u, 2.7s time (i915tex)
    31.5fps, 21.1u, 8.7s time (master, ttm)
    39fps, 17.9u, 1.3s time (master, no-ttm)
Texdown
    1327MB/s (i915tex)
    551MB/s (master, ttm)
    572MB/s (master, no-ttm)
Texdown, subimage
    1014MB/s (i915tex)
    134MB/s (master, ttm)
    148MB/s (master, no-ttm)
Ipers, no help screen
    255 000 tri/s, 100% cpu (i915tex)
    139 000 tri/s, 100% cpu (master, ttm)
    241 000 tri/s, 100% cpu (master, no-ttm)

I would summarize the results like this:
   - master/no-ttm has a basically "free" memory manager in terms of CPU 
overhead
   - master/ttm and GEM gain a proper memory manager but introduce a huge CPU 
overhead & consequent performance regression
   - i915tex makes use of userspace sub-allocation to resolve that regression & 
achieve comparable efficiency to master/no-ttm.

   - a separate regression seems to have killed texture upload performance on 
master/ttm relative to it's ancestor i915tex.

Keith

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM vs GEM discussion questions

Reply via email to