Optimize i2f()

2012-08-03 Thread Alex Deucher
On Mon, Jul 30, 2012 at 5:49 PM, Steven Fuerst  wrote:
> Looking through the kernel radeon drm source, it looks like the i2f()
> functions in r600_blit.c and r600_blit_ksm() can be optimized a bit.

Care to send a patch?

Thanks,

Alex

>
> The following extends the range to all unsigned 32bit integers, and avoids
> the slow loop by using the bsr instruction via __fls().  It provides an
> exact 1-1 correspondence up to 2^24.  Above that, there is the inevitable
> rounding.  This routine rounds towards zero (truncation).
>
> /* 23 bits of float fractional data */
> #define I2F_FRAC_BITS 23
> #define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)
>
> /*
>  * Converts an unsigned integer into 32-bit IEEE floating point
> representation.
>  * Will be exact from 0 to 2^24.  Above that, we round towards zero
>  * as the fractional bits will not fit in a float.  (It would be better to
>  * round towards even as the fpu does, but that is slower.)
>  * This routine depends on the mod(32) behaviour of the rotate instructions
>  * on x86.
>  */
> uint32_t i2f(uint32_t x)
> {
> uint32_t msb, exponent, fraction;
>
> /* Zero is special */
> if (!x) return 0;
>
> /* Get location of the most significant bit */
> msb = __fls(x);
>
> /*
> * Use a rotate instead of a shift because that works both leftwards
> * and rightwards due to the mod(32) beahviour.  This means we don't
> * need to check to see if we are above 2^24 or not.
> */
> fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK;
> exponent = (127 + msb) << I2F_FRAC_BITS;
>
> return fraction + exponent;
> }
>
> Steven Fuerst
>
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>


Re: Optimize i2f()

2012-08-03 Thread Alex Deucher
On Mon, Jul 30, 2012 at 5:49 PM, Steven Fuerst svfue...@gmail.com wrote:
 Looking through the kernel radeon drm source, it looks like the i2f()
 functions in r600_blit.c and r600_blit_ksm() can be optimized a bit.

Care to send a patch?

Thanks,

Alex


 The following extends the range to all unsigned 32bit integers, and avoids
 the slow loop by using the bsr instruction via __fls().  It provides an
 exact 1-1 correspondence up to 2^24.  Above that, there is the inevitable
 rounding.  This routine rounds towards zero (truncation).

 /* 23 bits of float fractional data */
 #define I2F_FRAC_BITS 23
 #define I2F_MASK ((1  I2F_FRAC_BITS) - 1)

 /*
  * Converts an unsigned integer into 32-bit IEEE floating point
 representation.
  * Will be exact from 0 to 2^24.  Above that, we round towards zero
  * as the fractional bits will not fit in a float.  (It would be better to
  * round towards even as the fpu does, but that is slower.)
  * This routine depends on the mod(32) behaviour of the rotate instructions
  * on x86.
  */
 uint32_t i2f(uint32_t x)
 {
 uint32_t msb, exponent, fraction;

 /* Zero is special */
 if (!x) return 0;

 /* Get location of the most significant bit */
 msb = __fls(x);

 /*
 * Use a rotate instead of a shift because that works both leftwards
 * and rightwards due to the mod(32) beahviour.  This means we don't
 * need to check to see if we are above 2^24 or not.
 */
 fraction = ror32(x, msb - I2F_FRAC_BITS)  I2F_MASK;
 exponent = (127 + msb)  I2F_FRAC_BITS;

 return fraction + exponent;
 }

 Steven Fuerst

 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Optimize i2f()

2012-07-30 Thread Steven Fuerst
Looking through the kernel radeon drm source, it looks like the i2f()
functions in r600_blit.c and r600_blit_ksm() can be optimized a bit.

The following extends the range to all unsigned 32bit integers, and avoids
the slow loop by using the bsr instruction via __fls().  It provides an
exact 1-1 correspondence up to 2^24.  Above that, there is the inevitable
rounding.  This routine rounds towards zero (truncation).

/* 23 bits of float fractional data */
#define I2F_FRAC_BITS 23
#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)

/*
 * Converts an unsigned integer into 32-bit IEEE floating point
representation.
 * Will be exact from 0 to 2^24.  Above that, we round towards zero
 * as the fractional bits will not fit in a float.  (It would be better to
 * round towards even as the fpu does, but that is slower.)
 * This routine depends on the mod(32) behaviour of the rotate instructions
 * on x86.
 */
uint32_t i2f(uint32_t x)
{
uint32_t msb, exponent, fraction;

/* Zero is special */
if (!x) return 0;

/* Get location of the most significant bit */
msb = __fls(x);

/*
 * Use a rotate instead of a shift because that works both leftwards
 * and rightwards due to the mod(32) beahviour.  This means we don't
 * need to check to see if we are above 2^24 or not.
 */
fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK;
exponent = (127 + msb) << I2F_FRAC_BITS;

return fraction + exponent;
}

Steven Fuerst
-- next part --
An HTML attachment was scrubbed...
URL: