On Thu, 10 Feb 2005 14:47:16 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:

> 
> Yes, it would be cool to be able to hotrod the card with alternate
> logic, but is it worth the development cost of opening up a design
> fork, and with Timothy as the overloaded resource?
> 
> Optimizing is always frustrating, yet it seems to work out in the end.
> As it is, it seems we're heading towards a design that delivers both
> respectable throughput and decent accuracy.  I think we just need to
> keep plugging away in that direction.

We need to make this design to be the lowest common denominator. 
Based on notorietry (and perhaps some revenue) from this, we can start
developing more, better, specialized, generalized products.

However, this initial design will live a long life.  Even past the
point when we release the RTL, an embedded version at ever-decreasing
price will find its way into all sorts of interesting embedded apps
and cheap graphics cards.  Therefore, we need to make it as good as we
can, while realizing that there are compromises to be made.

> 
> There's another way to approach the span setup bottleneck: set up a few
> spans in advance and save them in a queue.  This will cost about 100
> bytes of distributed ram per queue element.  The queue can deliver
> spans to the rasterizer at full clock speed until it drains, then the
> pipeline stalls as before.  However, this maps well onto the small
> triangle case where each of two trapezoids is wide at the top or bottom
> and narrow on the other side.  The queue fills during wide spans and
> drains during narrow ones.  So span setup can be designed to handle
> average load instead of worst case and thus get away with fewer
> multipliers.

Well, by the nature of having a pipelined design that can pack (which
is to say, when later stages stall, earlier bubbles get filled), there
will be some small amount of queueing between h and v (assuming
they're separate).  And it may be very much worth-while to increase
that buffering slightly.

> This pleasant behavior extends to trapezoid setup as well, and we could
> even contemplate bringing triangle setup onto the card.  

I think we're going to face a greater challenge figuring out what to
REMOVE from the design when we run out of transistors.  Fitting in
triangle setup, which involves actual divides that have to be
high-accuracy, is not going to be reasonable to expect.

> One thing
> that's been bothering me about delivering trapezoids to the card
> instead of triangles is, two trapezoids are quite a lot bulkier than
> one triangle and it's harder to come up with simple compression tricks
> like I suggested earlier for small triangles.  So, do only the
> expensive reciprocal on the host and multiply out the perspective
> gradients on the card.  It would be nice to be able to handle something
> in the neighborhood of 3 million triangles/sec on PCI and this strategy
> just might get there.

Yes, I'll be ready to start thinking about packet compression when a
few more things are ironed out.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to