On Sun, 6 Feb 2005 00:02:23 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Saturday 05 February 2005 22:13, Timothy Miller wrote:
> > On Sat, 5 Feb 2005 16:29:59 -0500, Daniel Phillips
> > > I'm sure you've already solved this, Timothy.  However, I need to
> > > get a more accurate handle on what's possible with this hardware,
> > > so here we go again.
> >
> > Well, you do and you don't.  The solution to the problem is something
> > that the driver developer never needs to know about.
> >
> > I can tell you how it will work, but it's not really something you
> > NEED to worry about.
> 
> Not knowing how the logic goes together means only ever being able to
> guess wildly at how features map to hardware resources, which isn't
> very satisfying.  But I guess I "need" to know it because at heart, I
> am an information packrat.

You misunderstand.  The objective is to make the hardware
functionality exactly match the features.  That one part may require
more or less logic than you expect is immaterial.  It's simply an
implementation.

> 
> My self assessment of my second attempt is, there's something workable
> in there but it's comically crude.  Now let me see if I can come up
> with something a little more realistic.  I imagine that carry
> propagation is what gets in the way of the two 8 bit adds completing
> within 5 ns.  So my next attempt staggers the two adders by one clock.
> Each of the two half sums are fed right back into the adder on every
> clock, and also fed to output latches.  An additional latch delays the
> low 8 bits of the result so the 16 bit accumulated sum is available to
> following stages on the same clock (and I finally get to use my queue).

Actually, it turns out that the problem with the 16-bit subtract had
nothing to do with the subtractor.  I was synthesizing the logic for a
S3 2000, but it took up less than 1% of the design, so what happened
was it as routing the block inputs and outputs to pins on opposite
sides of the chip, stretching it out over the whole chip.  I switched
to the smallest S3, and that solved the problem.

> This is just a fragment of a floating point adder of course, but the
> object of the exercise was to convince myself that the interpolater
> can, in theory, run at full speed.  Is the logic roughly right?  Now
> that I've had fun trying to roll my own solution, could you please tell
> me how it should be done?

Well, say it took 4 cycles to compute one sum.  Then what you need is
dZ, dZ*2, dZ*3, and dZ*4, all of which are either trivial or easy to
compute.  You use dZ*4 to get to the next loop, and send Z,0; Z,dZ;
Z,dZ*2; and Z,dZ*3 down the pipeline.

Happy now?  :)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to