Hi Timothy,

On Wednesday 02 February 2005 15:56, Timothy Miller wrote:
> On Wed, 2 Feb 2005 15:32:45 -0500, Daniel Phillips 
<[EMAIL PROTECTED]> wrote:
> > So this is all by way of convincing myself that if we do the DMA in
> > a fairly simple-minded way, it doesn't work out too badly.  Any
> > major oversights?
>
> No, this is good analysis.  One thing:  My initial estimate for
> maximum triangle throughput was 1 million triangles/second.  That's
> from an estimate of 32-word command packets on 33mhz PCI.  Someone
> else told me that that's not bad, considering the CPU overhead just
> for computing the geometry in the first place.

So far, so good.

Using the simple minded "small triangle" compression, 3 million 
triangles/second seems like a more worthy goal.  That's 100 thousand 
triangles/frame at 30 frames/sec, not too shabby for a PCI card.  The 
on-card triangle setup doesn't look like a bottleneck at all.  Maybe 
it's time to take a stab at estimating that.

As a wild guess, each trapezoid will need 2 or 3 clocks for 
non-perspective setup, or more if it is done iteratively to save 
multipliers.  So we'd have to drop below a handful of pixels/triangle 
before setup becomes a bottleneck.  By spending some extra real estate, 
I imagine the setup overhead could be pipelined away even for single 
pixel triangles, and the host ought to be able to cull zero pixel 
triangles unless we're computing coverage masks, which for sure won't 
happen on the initial rev.

Per-parameter perspective divides should probably be done on the host 
for the time being, meaning the 8.8 fixed point format is not 
appropriate and the temptation is to go to 24 bit fp per non-geometry 
parameter, adding an extra 6 bytes per textured triangle and messing up 
the internal alignment a little.  With a little bit of repacking, I 
think we can still hit 3 million triangles/second even before putting 
in the effort to implement more efficient primitives.

As far as host CPU requirements go, it would be a shame to let that 
lovely Ath64 just sit there idle.  SSE2 and 3Dnow are perfect for this 
task.  For simple textured triangles at 5 million triangles/second 
we're only asking for about 60 million divides/second on machines that 
deliver well over a gigaflop, and there's plenty of room for 
optimization.  So there will be lots of CPU left over for game physics.

All of the above "in my humble opinion" of course.

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to