On 1/19/2022 11:42 AM, Zhao Wei Liew via Gcc-patches wrote:
This patch implements an optimization for the following C++ code:

int f(int x) {
     return 1 / x;
}

int f(unsigned int x) {
     return 1 / x;
}

Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:

f(int):
     xor edx, edx
     mov eax, 1
     idiv edi
     ret
f(unsigned int):
     xor edx, edx
     mov eax, 1
     div edi
     ret

In comparison, clang++ -std=c++20 -O3 produces the following assembly:

f(int):
     lea ecx, [rdi + 1]
     xor eax, eax
     cmp ecx, 3
     cmovb eax, edi
     ret
f(unsigned int):
     xor eax, eax
     cmp edi, 1
     sete al
     ret

Clang's output is more efficient as it avoids expensive div operations.

With this patch, GCC now produces the following assembly:

f(int):
     lea eax, [rdi + 1]
     cmp eax, 2
     mov eax, 0
     cmovbe eax, edi
     ret
f(unsigned int):
     xor eax, eax
     cmp edi, 1
     sete al
     ret

which is virtually identical to Clang's assembly output. Any slight differences
in the output for f(int) is possibly related to a different missed optimization.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587751.html
Changes from v2:
1. Refactor from using a switch statement to using the built-in
if-else statement.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html
Changes from v1:
1. Refactor common if conditions.
2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type.
3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1.

gcc/ChangeLog:

        * match.pd: Simplify 1 / X where X is an integer.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/divide-6.c: New test.
        * gcc.dg/tree-ssa/divide-7.c: New test.
Thanks.  Given the original submission and most of the review work was done prior to stage3 closing, I went ahead and installed this on the trunk.
jeff

Reply via email to