Hi all, Putting together some bits and pieces from the list and adding in a generous helping of my own wild imagination, here is an estimate of card performance from the geometry DMA point of view.
I'll postulate a current drawing position in pixels so that I can then introduce the concept of a 'small triangle' with vertices expressed in 8.8 fixed point format and texture coordinates also expressed in 8.8, so each coordinate is 16 bits. Together with (u,v) texture coordinates, vertices are 8 bytes each. Adding in a 32 bit command/flags header, each triangle requires 36 bytes, and just to round it up to 40, let's throw in a 24 bit normal, rounded up to 32 bits. When triangle throughput is high, most triangles will be 'small triangles'. (Large triangles would require an additional 12 bytes of vertex, giving 48 bytes. Something would also have to be done about large textures. In any event, this is just a plausible scenario to use a basis for command throughput estimation, it's not a design proposal.) Picking a number out of thin air, I'll guess that we can push 5 million small triangles/second through the render pipe. So that's 200 MB/sec through the command buffer. (For now, I'll assume that all geometry goes through the command buffer. Though there may be good reasons for doing otherwise, it doesn't affect the throughput estimates.) This exceeds the 132 MB/sec of a 33 MHz PCI bus, which is what consumer motherboards seem to have these days, and we haven't considered texture DMA yet. Geometry throughput could be improved with triangle fans and strips, vertex arrays and on-card texture projection instead of explicit texture coordinates, however this is extra complication, so I'll just continue looking at what happens with the simple minded approach. I'll also ignore the PCI bus limitation for now because interfaces to better buses are planned, and we're getting 66% of max throughput anyway. OK, we probably want to generate a low-buffer interrupt every 5 ms or so to wake up our rendering task (200 Hz). If the rate is higher than this, context switch overhead starts to add up. So now we see that the command buffer has to be about one meg in size. This should be physically contiguous (or else the ring buffer logic gets messy) so it has to be allocated at boot time before physical memory gets too fragmented. This is probably the best argument for implementing indirect geometry DMA, but note that a one meg DMA buffer isn't particularly out of the ordinary these days. So this is all by way of convincing myself that if we do the DMA in a fairly simple-minded way, it doesn't work out too badly. Any major oversights? Regards, Daniel _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
