Ok, I want you to hold some of those thoughts and come back to them in force in a little while. We're not QUITE ready to define a register interface, but when we do, I would like to ask you to provide your input on ways of optimizing throughput.
On Wed, 2 Feb 2005 17:14:59 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote: > Hi Timothy, > > On Wednesday 02 February 2005 15:56, Timothy Miller wrote: > > On Wed, 2 Feb 2005 15:32:45 -0500, Daniel Phillips > <[EMAIL PROTECTED]> wrote: > > > So this is all by way of convincing myself that if we do the DMA in > > > a fairly simple-minded way, it doesn't work out too badly. Any > > > major oversights? > > > > No, this is good analysis. One thing: My initial estimate for > > maximum triangle throughput was 1 million triangles/second. That's > > from an estimate of 32-word command packets on 33mhz PCI. Someone > > else told me that that's not bad, considering the CPU overhead just > > for computing the geometry in the first place. > > So far, so good. > > Using the simple minded "small triangle" compression, 3 million > triangles/second seems like a more worthy goal. That's 100 thousand > triangles/frame at 30 frames/sec, not too shabby for a PCI card. The > on-card triangle setup doesn't look like a bottleneck at all. Maybe > it's time to take a stab at estimating that. > > As a wild guess, each trapezoid will need 2 or 3 clocks for > non-perspective setup, or more if it is done iteratively to save > multipliers. So we'd have to drop below a handful of pixels/triangle > before setup becomes a bottleneck. By spending some extra real estate, > I imagine the setup overhead could be pipelined away even for single > pixel triangles, and the host ought to be able to cull zero pixel > triangles unless we're computing coverage masks, which for sure won't > happen on the initial rev. > > Per-parameter perspective divides should probably be done on the host > for the time being, meaning the 8.8 fixed point format is not > appropriate and the temptation is to go to 24 bit fp per non-geometry > parameter, adding an extra 6 bytes per textured triangle and messing up > the internal alignment a little. With a little bit of repacking, I > think we can still hit 3 million triangles/second even before putting > in the effort to implement more efficient primitives. > > As far as host CPU requirements go, it would be a shame to let that > lovely Ath64 just sit there idle. SSE2 and 3Dnow are perfect for this > task. For simple textured triangles at 5 million triangles/second > we're only asking for about 60 million divides/second on machines that > deliver well over a gigaflop, and there's plenty of room for > optimization. So there will be lots of CPU left over for game physics. > > All of the above "in my humble opinion" of course. > > Regards, > > Daniel > _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
