Re: [Dri-devel] Smoother graphics with 16bpp on radeon

Ian Romanick Fri, 06 Dec 2002 09:35:59 -0800

On Thu, Dec 05, 2002 at 04:40:00PM -0800, magenta wrote:
> On Thu, Dec 05, 2002 at 03:56:09PM -0800, Ian Romanick wrote:
> > > 
> > > But it's not even supported in the DRI driver on the R100...  It's not like
> > > the wrapper can magically make functionality which isn't there to begin
> > > with appear, but in order to do the tweak in teh driver itself, the driver
> > > would have to support it anyway!  Unless I'm totally missing something
> > > about how FSAA is done in OpenGL, in which case I'd love for someone to
> > > explain.  All of the documents I've found ont he web indicate that
> > > GL_ARB_multisample is the way you do it, even in Windows.
> > 
> > It is one way.  It's the way that the OpenGL ARB has sanctified with an
> > extension.  It's not the only way.  On Windows with a Radeon, for example,
> > if you click the 'FSAA 2x' box, it will tell the driver to render to a
> > buffer twice as wide as requested and scale down when it does a blit for the
> > swap-buffer call.  'FSAA 4x' does 2x width & 2x height.  This is called
> > super-sampling.
> 
> My understanding was that the ARB_multisample extension could be
> implemented using supersampling (even if it's not actually done using
> multisampling), and that enabling ARB_multisample was functionally
> equivalent to clicking the FSAA checkbox in the driver.  If that's not the
> case, then that's been the source of my confusion all along.


It may be true for cards that support ARB_multisample.  However, according
to ATI, only the Radeon 9500 / 9700 support that extension.  All of the
other cards user supersampling, for which there is no extension.  If
ARB_multisample could be implemented somehow using supersampling, I would
think that at least the 8500 / 9000 would support it.

> > > Like, I thought that was the point to OpenGL's design in general - that the
> > > driver would use the high-level information that's present in order to tune
> > > its low-level operation, completely transparently to the user and
> > > application.
> > 
> > It does *work*.  However, it can be very difficult for the GL to make the
> > right choices.  Look at the work of CPUs, for example.  There's lots of
> > prefetch hardware in there to make the memory system faster, but adding
> > prefetch instructions in the right places can make a world of difference.
> > 
> > Any time a library make just-in-time optimization / fast-path choises, there
> > is some chance that will be wrong.  If it happens to be wrong for some huge,
> > critial app, that can be the difference in which system somebody chooses.
> 
> But how could it be wrong in such a way that some other choice could be
> right?  I mean, if the application sends the vertex array to it in a
> certain format, either the card can support that format or it can't and the
> driver has to convert it, right?  So is it just a matter of which
> conversion is least-sucky?

I guess "wrong" was a poor word choice.  I didn't mean wrongs as in
producing incorrect results.  I meant wrong as in having sub-optimal
performance.  My problem with coming up with examples here is that nobody
has done enough detailed performance testing of DRI drivers with enough apps
to determine what addition fast-paths might be needed.  Right now, for the
most part, there is one path.

> > Right now there is no example of this in DRI.  That doesn't mean that there
> > won't ever be.  The coming future of vertex & fragment programs only
> > INCREASES the likelyhood.
> 
> Wouldn't vertex/fragment programs already be using the card's native
> format(s) though?  Once the client state is all configured and that
> glDrawElements() call happens, wouldn't the driver have to either decide
> that the format is something the hardware supports, or convert it into
> something which it does?

The case I was thinking of for vertex / fragment programs was in the
compilation of the programs to native code.  It would be akin to selecting
specific optimization flags to pass to gcc.  Just saying '-O2 -march=i686'
doesn't always produce the best results.  Quite often you want to go in and
set very specific optimiation flags.  There's no way to do this (and nor
should there be!!!) in any of the vertex / fragment program specs.

A good example is (will be?) NV30 (and perhaps NV20, but I'm not 100%
positive).  That hardware supports a reduced precision floating point
format.  This is a register format (like using 16-bit x86 registers vs.
32-bit x86 registers).   For some calculations (i.e., those involving
color), using the fp-16 doesn't make any difference in the output and
improves performance.  Their compiler makes some fairly conservative
assumptions about when it can use this format.  For some apps, it might be a
performance win to tell it to relax those assumptions a little bit.

If an app uses ARB_{vertex,fragment}_program, there is no way for it to tell
the GL about that.  Even if it were possible, it would be in the same
situation as with anisotropic filtering:  how can the app support the "next"
thing that comes along?

-- 
Smile!  http://antwrp.gsfc.nasa.gov/apod/ap990315.html


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] Smoother graphics with 16bpp on radeon

Reply via email to