------- Comment #8 from vda dot linux at googlemail dot com 2009-06-21 16:11 ------- (In reply to comment #7) > It seems to make sense to bump cost of idiv a bit, given the fact that there > are register pressure implications. > > I would like to however understand what code sequences we produce that are > estimated to be long but ends up being shorter in practice. Would be possible > to try to give me some examples of constants where it is important to bump > cost > to 8? It is possible we can simply fix cost estimation in divmod expansion > instead.
Attached t.c.bz2 is a good source file to experiment with. With last month's svn snapshot of gcc, I did the following: /usr/app/gcc-4.4.svn.20090528/bin/gcc -g0 -Os -fomit-frame-pointer -ffunction-sections -c t.c objdump -dr t.o >t.asm with and without the patch, and compared results. (-ffunction-sections are used merely because they make "objdump -dr" output much more suitable for diffing). Here is the diff between unpatched and patched gcc's code generated for int_x / 16: Disassembly of section .text.id_x_16: 0000000000000000 <id_x_16>: - 0: 89 f8 mov %edi,%eax - 2: ba 10 00 00 00 mov $0x10,%edx - 7: 89 d1 mov %edx,%ecx - 9: 99 cltd - a: f7 f9 idiv %ecx - c: c3 retq + 0: 8d 47 0f lea 0xf(%rdi),%eax + 3: 85 ff test %edi,%edi + 5: 0f 49 c7 cmovns %edi,%eax + 8: c1 f8 04 sar $0x4,%eax + b: c3 retq int_x / 2: Disassembly of section .text.id_x_2: 0000000000000000 <id_x_2>: 0: 89 f8 mov %edi,%eax - 2: ba 02 00 00 00 mov $0x2,%edx - 7: 89 d1 mov %edx,%ecx - 9: 99 cltd - a: f7 f9 idiv %ecx - c: c3 retq + 2: c1 e8 1f shr $0x1f,%eax + 5: 01 f8 add %edi,%eax + 7: d1 f8 sar %eax + 9: c3 retq As you can see, code become smaller and *much* faster (not even mul insn is used now). Here is an example of unsigned_x / 641. In this case, code size is the same, but the code is faster: Disassembly of section .text.ud_x_641: 0000000000000000 <ud_x_641>: - 0: ba 81 02 00 00 mov $0x281,%edx - 5: 89 f8 mov %edi,%eax - 7: 89 d1 mov %edx,%ecx - 9: 31 d2 xor %edx,%edx - b: f7 f1 div %ecx + 0: 89 f8 mov %edi,%eax + 2: 48 69 c0 81 3d 66 00 imul $0x663d81,%rax,%rax + 9: 48 c1 e8 20 shr $0x20,%rax d: c3 retq There is not a single instance of code growth. Either newer gcc is better or maybe code growth cases are in 32-bit code only. I will attach t64.asm.diff, take a look if you want to see all changes in generated code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354