On Fri, Apr 04, 2003 at 10:08:36AM -0800, Ian Romanick wrote:
Right now people use things like Viewperf to make systems purchase decisions. Unless the graphics hardware and the rest of the system are very mismatched, the immediate API already has an impact on performance in those benchmarks.
The performance of the immediate API *is* important to real applications. Why do you think Sun came up with the SUN_vertex extension? To reduce the overhead of the immediate API, of course. :)
[sample code cut]
But this is all of _very_ _little_ importance when compared by the ability of _writing_ a full driver fast, which is given by a well designed OOP interface. As I said here several times, this kind of low-level optimizations consume too much development time causing that higher-level optimizations (usually with much more impact on performance) are never attempted.
In principle, I think the producer/consumer idea is good. Why not implement known optimizations in it from the start? We already having *working code* to build formated vertex data (see the radeon & r200 drivers), why not build the object model from there? Each concrete producer class would have an associated vertex format. On creation, it would fill in a table of functions to put data in its vertex buffer. This could mean pointers to generic C functions, or it could mean dynamically generating code from assembly stubs.
The idea is that the functions from this table could be put directly in the dispatch table. This is, IMHO, critically important.
The various vertex functions then just need to call the object's produce method. This all boils down to putting a C++ face on a technique that has been demonstrated to work.
I hope that integration of assembly generation with C++ is feasible but I see it as an implementation issue, regardless the preformance issues, which according to all who have replied aren't that neglectable as I though. The reason is that this kind of optimizations is very dependent of the vertex formats and other hardware details dificulting reusing the code - which is exactly what I want to avoid at this stage.
I do have one question. Do we really want to invoke the producer on every vertex immediatly? In the radeon / r200 drivers this is just to copy the whole vertex to a DMA buffer. Why not generate the data directly where it needs to go? I know that if the vertex format changes before the vertex is complete we need to copy out of the temporary buffer into the GL state vector, but that doesn't seem like the common case. At the very least, some guys at Intel think generating data directly in DMA buffers is the way to go:
http://www.intel.com/technology/itj/Q21999/ARTICLES/art_4.htm
This is a very interesting read. Thanks for the pointer.
It's complicated to know the vertices position on the DMA from the beginning, specially because of the clipping, since vertices can be added or removed, but if I understood correctly, it's still better to do that on the DMA memory and move the vertices around to avoid cache hits. But can be very tricky: imagine that clipping generate vertices that don't fit the DMA buffer anymore, what would be done then?
The things I found more interesting in the issue of applting the TCL operations on all the vertices at once, or a vertice at each time. From previous discussions on this list it seems that nowadays most of CPU performace is dictated by the cache, so it really seems the later option is more efficient, but Mesa implements the former (they are even called "pipeline stages") and to change would mean a big overhaul of the TnL module.
On a historical note, the earliest versions of Mesa processed a single vertex at a time, instead of operating on arrays of vertices, stage by stage. Going to the later was a big speed up at the time.
Since the T&L code is a module, one could implement the single-vertex scheme as an alternate module. It would be an interesting experiment.
I guess my point is that we *can* have our cake and eat it too. We can have a nice object model and have "classic" low-level optimizations. The benefit of doing that optimizations at the level of the object model is that they only need to be done once for a given vertex format. Reusing optimizations sounds like a big win to me! :)
I hope so. But at this point I'll just try to design the objects so they
allow both kind of implementations.
Thanks for the feedback.
José Fonseca
-Brian
------------------------------------------------------- This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel