José Fonseca wrote:
On Fri, Apr 04, 2003 at 08:48:35AM -0700, Brian Paul wrote:

In general, this sounds reasonable but you also have to consider performance.
The glVertex, Color, TexCoord, etc commands have to be simple and fast. As it is now, glColor4f (for example) (when implemented in X86 assembly) is just a jump into _tnl_Color4f() which stuffs the color into the immediate struct and returns. Something similar is done in the R200 driver.


If the implementation of _tnl_Color4f() involves a call to producer->Color4f() we'd lose some performance.


I know, but my objective is to design a good object interface on which
all drivers may fit and reuse code. When a driver gets to the point
where the producer->Color4F() calls are the main performance bottleneck
(!?) the developer is free to write a tailored version of TnLProducer
that elimates that extra call:

class TnLProducerFast {

Vertex current;
TnLConsumer *consumer;
TnLProducer(TnLConsumer *_consumer) {
consumer=_consumer;
}


void activate() {
_glapi_setapi(GL_COLOR3f, _Color3f)
...
}
static _Color3f(r, g, b) {
TnLProducer *self = GET_THIS_PTR_FROM_CURRENT_CTX();
self->current.r = r; self->current.g = g; self->current.b = b;
}
};


We can even generate automatically this TnLProducerFast from the
original TnLProducer with a template, i.e.,

template < class T > class TnLProducerTmpl {

T tnl;

void activate() {
_glapi_setapi(GL_COLOR3f, _Color3f)
...
}
static _Color3f(r, g, b) {
TnLProducerTmpl *self = GET_THIS_PTR_FROM_CURRENT_CTX();
self->tnl.Color3f(r, g, b); // This call is eliminated if T::Color3f
// is inlined
}
}


typedef TnLProducerTmpl< TnLProducer > TnLProducerFast;

But this is all of _very_ _little_ importance when compared by the
ability of _writing_ a full driver fast, which is given by a well
designed OOP interface. As I said here several times, this kind of
low-level optimizations consume too much development time causing that
higher-level optimizations (usually with much more impact on
performance) are never attempted.

The optimization of the vertex api has yeilded huge improvements. Even with the runtime-codegenerated versions of these functions in the radeon/r200 driver, they *still* dominate viewperf profile runs - meaning that *all other optimizations* are a waste of time for viewperf, because 60% of your time is being spent in the vertex api functions.



Nowadays, vertex arrays are the path to use if you really care about
performance, of course, but a lot of apps still use the regular
per-vertex GL functions.

Except for applications that already exist and use the vertex apis -- of which there are many.


And vertex arrays aren't the fastpath any more, but things like ARB_vertex_array_object or NV_vertex_array_range.


Now that you mention vertex array, for that, the producer would be
different, but the consumer would be the same.

For developing a driver, it's not necessary to touch the tnl code at all - even hardware t&l drivers can quite happily plug into the existing mechanisms and get OK performance.


Keith



-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to