>>>>> "MB" == Miles Bader <mi...@gnu.org> writes:
MB> Hm, are you sure that's not backwards? When I tried the git C version[*], MB> as well as your most recent FT_MulFix_x86_64, it returned 0xFFFF8506... Odd. Adding your algo to my test app, I get: 7AFA8000, FFFFFFFF, FFFF8505, FFFF8505, FFFF8506 # a , b , FT , JC , MB I see that I have one small error in the C code in my app. FT has: c = (FT_Long)( ( (FT_Int64)a * b + 0x8000L ) >> 16 ); whereas I used: c = (int32_t)(((int64_t)a*b + 0x8000L) >> 16); But changing the int32_t to long does not change the results. Yours still is always +1 compared to the C, whenever the first arg represents a positive value with fractional part == 1/2. Oddly, though, gcc now refuses to compile my asm, even though it did do so before, complaining that I cannot guess what arg size to use for the imul.... Wierd. (The existing executables prove that it used to.) A simple way around that is to specify "D" and "S" as the contraints for a and b. (The rdi and rsi regesters are where the x86_64 abi puts the first two args which are passed to a function.) The disassembly of the final version is: 00000000004006c0 <mf>: 4006c0: 48 89 f8 mov %rdi,%rax 4006c3: 48 f7 ee imul %rsi 4006c6: 48 01 d0 add %rdx,%rax 4006c9: 48 05 00 80 00 00 add $0x8000,%rax 4006cf: 48 c1 f8 10 sar $0x10,%rax 4006d3: c3 retq And I get this disassembly of yours: 0000000000400840 <miles>: 400840: 48 63 c6 movslq %esi,%rax 400843: 48 63 ff movslq %edi,%rdi 400846: 48 0f af c7 imul %rdi,%rax 40084a: 48 05 00 80 00 00 add $0x8000,%rax 400850: 48 c1 f8 10 sar $0x10,%rax 400854: c3 retq I also just added this version to my test app: int another (int32_t a, int32_t b) { long r = (long)a * (long)b; long s = r >> 31; return (r + s + 0x8000) >> 16; } That results in: 0000000000400760 <another>: 400760: 48 63 ff movslq %edi,%rdi 400763: 48 63 f6 movslq %esi,%rsi 400766: 48 0f af f7 imul %rdi,%rsi 40076a: 48 89 f0 mov %rsi,%rax 40076d: 48 c1 f8 1f sar $0x1f,%rax 400771: 48 8d 84 06 00 80 00 lea 0x8000(%rsi,%rax,1),%rax 400778: 00 400779: 48 c1 f8 10 sar $0x10,%rax 40077d: c3 retq Since FT's C version uses longs, though, this: int another (long a, long b) { long r = (long)a * (long)b; long s = r >> 31; return (r + s + 0x8000) >> 16; } gives: 0000000000400760 <another>: 400760: 48 0f af f7 imul %rdi,%rsi 400764: 48 89 f0 mov %rsi,%rax 400767: 48 c1 f8 1f sar $0x1f,%rax 40076b: 48 8d 84 06 00 80 00 lea 0x8000(%rsi,%rax,1),%rax 400772: 00 400773: 48 c1 f8 10 sar $0x10,%rax 400777: c3 retq So it would seem that when compiling for any processor where FT_Long is the same as int64_t and where that fits into a single register, then that last bit of C might be optimal, yes? -JimC -- James Cloos <cl...@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 _______________________________________________ Freetype-devel mailing list Freetype-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/freetype-devel