>>>>> "MB" == Miles Bader <mi...@gnu.org> writes:

MB> The compiler generates the following assembly:

MB>     mov     %esi, %eax
MB>     mov     %edi, %edi
MB>     imulq   %rdi, %rax
MB>     addq    $32768, %rax
MB>     shrq    $16, %rax

That does not match the C code though; it rounds negative values wrong.

The C version does away-from-zero rounding.

Using the single arg version of imulq generates a 128 bit result; the
more significant part of which will be 0 iff the product is >=0 and
will be -1 if the product is <0, given that the multiplicands were
only 32 bits.  Adding that, in addition to the 32768, to rax ensures
that the result of the >>=16 is rounded the way freetype wants.

If you use the two arg version of imul, you have to copy the msb of the
result (or do a compare and jump, like the C code) to determine whether
to add 0x8000 or 0x7FFF.

Matching the rounding was the hardest part; noting that the upper 64
bits of the 128-bit product would always be just sign-extension bits
and that, because of the prototype of FT_MulFix() itself, the vaules
are already promoted to 64 bits before they get to the assembly were
what provided the most (in-order) speedups.

If it can be done better, though, I'd be happy to know!

Thanks for also looking at it.

-JimC
-- 
James Cloos <cl...@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

_______________________________________________
Freetype-devel mailing list
Freetype-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/freetype-devel

Reply via email to