On Mon, Jun 12, 2023 at 4:03 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> The following simple test case, from PR 104610, shows that memcmp () == 0
> can result in some bizarre code sequences on x86.
>
> int foo(char *a)
> {
>     static const char t[] = "0123456789012345678901234567890";
>     return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> }
>
> with -O2 currently contains both:
>         xorl    %eax, %eax
>         xorl    $1, %eax
> and also
>         movl    $1, %eax
>         xorl    $1, %eax
>
> Changing the return type of foo to _Bool results in the equally
> bizarre:
>         xorl    %eax, %eax
>         testl   %eax, %eax
>         sete    %al
> and also
>         movl    $1, %eax
>         testl   %eax, %eax
>         sete    %al
>
> All these sequences set the result to a constant, but this optimization
> opportunity only occurs very late during compilation, by basic block
> duplication in the 322r.bbro pass, too late for CSE or peephole2 to
> do anything about it.  The problem is that the idiom expanded by
> compare_by_pieces for __builtin_memcmp_eq contains basic blocks that
> can't easily be optimized by if-conversion due to the multiple
> incoming edges on the fail block.
>
> In summary, compare_by_pieces generates code that looks like:
>
>         if (x[0] != y[0]) goto fail_label;
>         if (x[1] != y[1]) goto fail_label;
>         ...
>         if (x[n] != y[n]) goto fail_label;
>         result = 1;
>         goto end_label;
> fail_label:
>         result = 0;
> end_label:
>
> In theory, the RTL if-conversion pass could be enhanced to tackle
> arbitrarily complex if-then-else graphs, but the solution proposed
> here is to allow suitable targets to perform if-conversion during
> compare_by_pieces.  The x86, for example, can take advantage that
> all of the above comparisons set and test the zero flag (ZF), which
> can then be used in combination with sete.  Hence compare_by_pieces
> could instead generate:
>
>         if (x[0] != y[0]) goto fail_label;
>         if (x[1] != y[1]) goto fail_label;
>         ...
>         if (x[n] != y[n]) goto fail_label;
> fail_label:
>         sete result
>
> which requires one less basic block, and the redundant conditional
> branch to a label immediately after is cleaned up by GCC's existing
> RTL optimizations.
>
> For the test case above, where -O2 -msse4 previously generated:
>
> foo:    movdqu  (%rdi), %xmm0
>         pxor    .LC0(%rip), %xmm0
>         ptest   %xmm0, %xmm0
>         je      .L5
> .L2:    movl    $1, %eax
>         xorl    $1, %eax
>         ret
> .L5:    movdqu  16(%rdi), %xmm0
>         pxor    .LC1(%rip), %xmm0
>         ptest   %xmm0, %xmm0
>         jne     .L2
>         xorl    %eax, %eax
>         xorl    $1, %eax
>         ret
>
> we now generate:
>
> foo:    movdqu  (%rdi), %xmm0
>         pxor    .LC0(%rip), %xmm0
>         ptest   %xmm0, %xmm0
>         jne     .L2
>         movdqu  16(%rdi), %xmm0
>         pxor    .LC1(%rip), %xmm0
>         ptest   %xmm0, %xmm0
> .L2:    sete    %al
>         movzbl  %al, %eax
>         ret
>
> Using a target hook allows the large amount of intelligence already in
> compare_by_pieces to be re-used by the i386 backend, but this can also
> help other backends with condition flags where the equality result can
> be materialized.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-12  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.cc (ix86_finish_compare_by_pieces): New
>         function to provide a backend specific implementation.
>         (TARGET_FINISH_COMPARE_BY_PIECES): Use the above function.
>
>         * doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook.
>         * doc/tm.texi: Regenerate.
>
>         * expr.cc (compare_by_pieces): Call finish_compare_by_pieces in
>         targetm to finalize the RTL expansion.  Move the current
>         implementation to a default target hook.
>         * target.def (finish_compare_by_pieces): New target hook to allow
>         compare_by_pieces to be customized by the target.
>         * targhooks.cc (default_finish_compare_by_pieces): Default
>         implementation moved here from expr.cc's compare_by_pieces.
>         * targhooks.h (default_finish_compare_by_pieces): Prototype.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/pieces-memcmp-1.c: New test case.

This patch needs middle-end approval first.

Uros.

>
>
> Thanks in advance,
> Roger
> --
>

Reply via email to