Hi all,

Putting together some bits and pieces from the list and adding in a 
generous helping of my own wild imagination, here is an estimate of 
card performance from the geometry DMA point of view.

I'll postulate a current drawing position in pixels so that I can then 
introduce the concept of a 'small triangle' with vertices expressed in 
8.8 fixed point format and texture coordinates also expressed in 8.8, 
so each coordinate is 16 bits.  Together with (u,v) texture 
coordinates, vertices are 8 bytes each.  Adding in a 32 bit 
command/flags header, each triangle requires 36 bytes, and just to 
round it up to 40, let's throw in a 24 bit normal, rounded up to 32 
bits.  When triangle throughput is high, most triangles will be 'small 
triangles'.  (Large triangles would require an additional 12 bytes of 
vertex, giving 48 bytes.  Something would also have to be done about 
large textures.  In any event, this is just a plausible scenario to use 
a basis for command throughput estimation, it's not a design proposal.)

Picking a number out of thin air, I'll guess that we can push 5 million 
small triangles/second through the render pipe.  So that's 200 MB/sec 
through the command buffer.  (For now, I'll assume that all geometry 
goes through the command buffer.  Though there may be good reasons for 
doing otherwise, it doesn't affect the throughput estimates.)

This exceeds the 132 MB/sec of a 33 MHz PCI bus, which is what consumer 
motherboards seem to have these days, and we haven't considered texture 
DMA yet.  Geometry throughput could be improved with triangle fans and 
strips, vertex arrays and on-card texture projection instead of 
explicit texture coordinates, however this is extra complication, so 
I'll just continue looking at what happens with the simple minded 
approach.  I'll also ignore the PCI bus limitation for now because 
interfaces to better buses are planned, and we're getting 66% of max 
throughput anyway.

OK, we probably want to generate a low-buffer interrupt every 5 ms or so  
to wake up our rendering task (200 Hz).  If the rate is higher than 
this, context switch overhead starts to add up.  So now we see that the 
command buffer has to be about one meg in size.  This should be 
physically contiguous (or else the ring buffer logic gets messy) so it 
has to be allocated at boot time before physical memory gets too 
fragmented.  This is probably the best argument for implementing 
indirect geometry DMA, but note that a one meg DMA buffer isn't 
particularly out of the ordinary these days.

So this is all by way of convincing myself that if we do the DMA in a 
fairly simple-minded way, it doesn't work out too badly.  Any major 
oversights?

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to