On Sun, 30 Jan 2005 20:54:40 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> Hi Timothy,
> 
> You mentioned that the logic should end up clocked somewhere around
> 200 MHz but that the card would be capable of 400 MHz fill rate.  This
> means two pixel pipelines, no?  This would seem to correspond well with
> the number of multipliers available.  The question is, how best to
> factor the work between the units?  I can imagine several alternatives:
> 
>   1) Two horizontally adjacent pixels per clock

This one.  It ensures that the memory controller can maximize
throughput for writes (well, makes it easier), and allows a lot of
logic to be shared between pipelines.  Really, it's not two
pipelines--I'm going to design one pipeline that processes two pixels
and see just how much simplification I can make that way.

> 
>   2) Two vertically adjacent pixels per clock
> 
>   3) Two adjacent trapezoid spans in parallel
> 
>   4) Two arbitrary trapezoids in parallel
> 
> The higher you go up the food chain, so to speak, the bigger a penalty
> DRAM row crossing becomes but the more flexibility is introduced.  My
> guess is, you're thinking of alternative (1), since it would need the
> least logic and earlier setup stages don't look like bottlenecks.

Yup.

> Next question, should the pixel units be identical, or should one of
> them be more capable than the other, to handle some of less common but
> important render combinations.  Then at least one pixel unit could
> still run instead of falling all the way back to software.  This is
> analogous to the way the original Pentium was organized.  It makes a
> lot of sense to me.

The idea is to make them totally unified.

> With alternative (4) above there is a lot of flexibility.  For example,
> if only one of the pixel units is capable of multitexturing, the other
> can be busy with single-textured triangles.  But a fairly complex
> scheduler stage would be needed in front of trapezoid setup, so maybe
> this is a good idea but for a later rev.

This makes sense, but you point out some of the problems.  If space
gets too tight, I may consider something like this.  When memory
bandwidth usage reaches a certain level, it doesn't matter how many
pipelines there are.

> It does seem well worth the effort to implement two parallel pixel units
> right from the beginning, doubling the fill rate.  It's probably
> sensible to make them identical and save some design time for the first
> rev.

Making one and instantiating twice would make some of it easier, but I
think I can save some space just coding them together into one.

> In case (1) a nice optimization would be to do the perspective divide
> only every second pixel and linearly interpolate for the other.  This
> adds more logic, but it could conceivably be a way to get to a 4x pixel
> unit design within the limited supply of multipliers.

I've thought about things like that.  I'm not sure what effect it
would have on the image quality.

> Just thinking aloud here and trying to get my mind wrapped around the
> tradeoffs involved.

That's what we're here for!  :)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to