Ok, I want you to hold some of those thoughts and come back to them in
force in a little while.  We're not QUITE ready to define a register
interface, but when we do, I would like to ask you to provide your
input on ways of optimizing throughput.



On Wed, 2 Feb 2005 17:14:59 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> Hi Timothy,
> 
> On Wednesday 02 February 2005 15:56, Timothy Miller wrote:
> > On Wed, 2 Feb 2005 15:32:45 -0500, Daniel Phillips
> <[EMAIL PROTECTED]> wrote:
> > > So this is all by way of convincing myself that if we do the DMA in
> > > a fairly simple-minded way, it doesn't work out too badly.  Any
> > > major oversights?
> >
> > No, this is good analysis.  One thing:  My initial estimate for
> > maximum triangle throughput was 1 million triangles/second.  That's
> > from an estimate of 32-word command packets on 33mhz PCI.  Someone
> > else told me that that's not bad, considering the CPU overhead just
> > for computing the geometry in the first place.
> 
> So far, so good.
> 
> Using the simple minded "small triangle" compression, 3 million
> triangles/second seems like a more worthy goal.  That's 100 thousand
> triangles/frame at 30 frames/sec, not too shabby for a PCI card.  The
> on-card triangle setup doesn't look like a bottleneck at all.  Maybe
> it's time to take a stab at estimating that.
> 
> As a wild guess, each trapezoid will need 2 or 3 clocks for
> non-perspective setup, or more if it is done iteratively to save
> multipliers.  So we'd have to drop below a handful of pixels/triangle
> before setup becomes a bottleneck.  By spending some extra real estate,
> I imagine the setup overhead could be pipelined away even for single
> pixel triangles, and the host ought to be able to cull zero pixel
> triangles unless we're computing coverage masks, which for sure won't
> happen on the initial rev.
> 
> Per-parameter perspective divides should probably be done on the host
> for the time being, meaning the 8.8 fixed point format is not
> appropriate and the temptation is to go to 24 bit fp per non-geometry
> parameter, adding an extra 6 bytes per textured triangle and messing up
> the internal alignment a little.  With a little bit of repacking, I
> think we can still hit 3 million triangles/second even before putting
> in the effort to implement more efficient primitives.
> 
> As far as host CPU requirements go, it would be a shame to let that
> lovely Ath64 just sit there idle.  SSE2 and 3Dnow are perfect for this
> task.  For simple textured triangles at 5 million triangles/second
> we're only asking for about 60 million divides/second on machines that
> deliver well over a gigaflop, and there's plenty of room for
> optimization.  So there will be lots of CPU left over for game physics.
> 
> All of the above "in my humble opinion" of course.
> 
> Regards,
> 
> Daniel
>
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to