On Tue, 8 Feb 2005 19:59:54 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote: > On Tuesday 08 February 2005 18:11, Lourens Veen wrote: > > On Tuesday 08 February 2005 22:06, Daniel Phillips wrote: > > > It's the interpolants that are really going to eat multipliers: > > > > > > Horizontal rasterization: > > > > > > - two multiplies per interpolant for perspective correction > > > > > > Vertical rasterization: > > > > > > - one multiply per interpolant to correct for pixel alignment > > > > Are these in the model yet? > > Yes. > > > > With 17 interpolants, most of which need perspective correction (in > > > my opinion; some may think this justifiable only for textures) > > > we've already exceeded our multiplier budget and haven't even begun > > > to think about filtering, blending, mipmapping, fog and probably > > > other things. > > > > If we can make a reciprocal with a LUT and some logic, maybe we can > > do the same for a multiplier? Haven't thought it through, but > > multiplying is generally easier than dividing. > > Yes. I suppose that is why multipliers start disappearing when you use > the larger ram blocks.
No, the multipliers are dedicated logic. If you wanted to put an 18x18 multiplier into a RAM block, you'd have to have a 36 bit address. Where are you going to fit 2^36 bits? The reason the multipliers disappear when you use RAMs in 36-bit mode is because each multiplier is paired with a RAM block, and they share some data lines. Apparently, the pairing is useful for digital signal processing like FFTs and stuff, but since we want to use them independently, we have to deal with some limitations. > > > > So pretty soon it's time to make some hard choices about what is > > > expendable, where to compromise on quality and throughput, and how > > > throughput is going to degrade gracefully as features are turned > > > on. All of which I'm sure Timothy has been thinking about, but now > > > it's about time to take inventory and see just how bad things are. > > > > How complete is the software model right now? I think it would be a > > good idea to try and complete that as much as possible. It will give > > a complete picture of what we need to do, and a framework to figure > > out what is the best compromise. > > Did you really mean to direct all these questions to Timothy? Anyway: > it appears to be a rather well thought out implementation of the OpenGL > 1.3 rendering spec, though I still haven't researched a lot of OpenGL > details thoroughly enough to know for sure. Others here have. Well, if we need to add functionality, we need to figure that out. Otherwise, the most useful thing to do right now is to use the float25 class and perhaps start making other modifications that reflect the implementation (like fixed-point). But it may be too early to make some of those decisions. I suggest we work out exactly how many fixed-point bits each fragment attribute needs after perspective divide. > > > It's also possible to create more multipliers in random logic, as > > > Timothy mentioned several times, but this is only going to work out > > > in places where precision is really limited. > > > > And it may be expensive. If a single generic adder takes up 1%, then > > how much will a multiplier be? > > Floating point multiplication is easier than floating point addition: > you multiply the mantissas, discard the least significant bits, add the > exponents and xor the signs. It's coming up with lots of dedicated > fixed point multipliers that is the problem. They can be pretty > simple, but the simple implementation will eat a lot of logic in the > form of adders. These simple shift-add multipliers need a fixed point > add for each stage, and there are as many stages as there are bits. > Multiplication by table lookup is also possible as you mentioned. Only for numbers so small that you're better off using dedicated logic. The biggest reason to use a LUT is for complicated, nonlinear things that are one-in/one-out like doing a reciprocal. This also includes color/gamma tables and things like that. > In > fact, this FPGA appears to work entirely by lookup tables and doesn't > actually implement gate logic at all. I don't know, maybe they all > work that way, but this is the only one I've ever looked at and it does > seem very cool. That's a misleading way to put it. Yes, Xilinx CLB's are made up of small look-up tables, but all they really are are generalized four-input logic gates. Basically, what you have is a register with 16 bits in it and a MUX with 4 select bits. The output of the MUX is the logic function. In addition to those, there are some extra XOR gates, MUXes, and flipflops/latches. Plus some other basic stuff. > I still have only an inkling of where the boundaries lie in terms of > fpga resources, but the picture that's beginning to emerge is that the > render model as defined will happily use up all the resources this fpga > has to offer, without some really careful shoehorning. The thing that > makes it hard is trying to get everything running in parallel at a > steady two pixels per clock. And as Timothy mentioned a few times, > having enough resources is only half the battle, there's also routing > to worry about. > > It's fun isn't it? :-) Indeed. :) _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
