On Friday 04 February 2005 23:40, you wrote:
> On Friday 04 February 2005 15:21, Lourens Veen wrote:
> > On Friday 04 February 2005 20:47, Daniel Phillips wrote:
> > > On Friday 04 February 2005 09:14, Lourens Veen wrote:
> >
> > <fast divide for perspective correction>
> >
> > > > What kind of precision is acceptable for this?
> > >
> > > Hi Lourens,
> > >
> > > My kneejerk reaction is that 16 bits is required for the
> > > perspective divide. I did experiment with 16/8 divides at one point
> > > in software but never managed to produce stable results.  Floating
> > > point output might have helped there, but I really expect 8 bit
> > > divide precision to cause a lot of easily visible artifacts.
> >
> > Well, we're not really limited to those two options. The input value
> > is a 24-bit float, with 8 bit exponent and 17 bits of mantissa
> > including the hidden 1 bit, and what we need is the reciprocal.
<snip>
> > because a division unit
> > takes up way too much space on the FPGA. We do have some 18-bit
> > integer multipliers, but they're scarce, so we only want to use them
> > if nothing else is good enough.
>
> This is really an excellent place to burn a multiplier or two, please
> don't be shy :-)

Well, not if we don't need it :-). Anyway, the main problem is that we can 
only get 36 bits of data out of the LUT for two reciprocals, and that the LUT 
has a maximum of 1024 entries.

> > Now, 1k words means that we use the topmost 10 bits of the input
> > mantissa as an index into the table. If we store the 16-bit results
> > in the table, that gives us 9 bits of precision (1/x is not a linear
> > function, and we lose a bit in the approximation). That is, you get a
> > 16-bit value back, but the least significant bits are garbage.
>
> Speaking more precisely, they've been quantized.

Exactly. I haven't really found out what the best way of defining precision 
is. If 90% of your values are correct to 15 bits but you have a few outliers 
that are only correct to 13 bits, what is your precision? 15 bits? 13 bits? 
Somewhere in between? Or in other words, do you use maximum error, or RMS, or 
what?

I guess in the end what really matters is whether it looks good or not.

> > You
> > can be sure however that if you round it to a 9-bit value, you get
> > the same result as when you had rounded the correct result to 9 bits.
> > 9 bits is not too good though.
> >
> > Second option is to read two consecutive 16-bit values from the LUT,
> > and do a linear interpolation between them. That costs us a
> > multiplier, but gives 15 bits precision. Unfortunately, it requires
> > reading two words from the RAM block per pixel, and we can only read
> > two words at a time. So this is not going to work for a dual-pixel
> > pipeline.
>
> I'm thinking that the divide approximation really should be worked out
> in detail for the two-pixel case.  There has to be some redundancy to
> take advantage of.

I've given it a try. We need to calculate W1 = 1/M and W2 = 1/(M + dMdX). The 
problem is that we know nothing about dMdX. So, we know nothing about the 
relationship between the two reciprocals, they're essentially completely 
independent. I worked it out and ended up with

W1 = 1 / M
W2 = W1 - dMdX / (M*M + M*dMdX)

Looks worse than just calculating two values. Alternatively, we could use a 
linear approximation

W1 = 1 / M
W2 = W1 + W1' * dMdX            (where W1' is the derivative of W1 wrt M)

which yields

W2 = W1 - dMdX*W1*W1

I'm getting a little tired but I think the quotient works out to

1 - (dMdX*dMdX*W1*W1)

which would be pretty bad, since the error grows quadratically with dMdX and 
W1. I think that that means that anything that is not parallel to the screen 
will get in trouble quickly, and that the closer an object is the bigger the 
distortion.

> > That allows us to do a linear
> > interpolation while reading only one word per pixel, at a cost of
> > some precision. It looks like I can almost get 13 bits of precision
> > this way, and a 4-bit multiplier can probably be done in normal
> > logic, without using the inbuilt multipliers on the FPGA.
> >
> > I guess when I get it to work we'll have to put it into the software
> > model and see what the result looks like.
>
> Yes.  I imagine you're getting pretty close to the money with 13 bits of
> precision.

I hope so. We'll have to see what it looks like. Colours are only 8 bits 
anyway, so that won't be a problem, but I'm not sure about texture 
coordinates.

> > Incidentally, the software model currently uses 32-bits floats where
> > the hardware would be using 25-bits floats. I guess we need a real
> > float25 class with appropriately diminished performance...
>
> Yes, and just to be kind to old C hacks like me, please don't overload
> the math operators, just make it a function so it's readable out of
> context.

Will you change the model around to replace all the + signs with a .add() call 
then?

I was thinking to overload the operators with ones that give full 16-bit 
precision, and have functions for approximations like this.

Lourens
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to