https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100865

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <h...@gcc.gnu.org>:

https://gcc.gnu.org/g:edafb35bdadf309ebb9d1eddc5549f9e1ad49c09

commit r12-1958-gedafb35bdadf309ebb9d1eddc5549f9e1ad49c09
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Wed Jun 2 07:15:45 2021 -0700

    x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

    1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
    operands to vector broadcast from an integer with AVX.
    2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
    won't increase stack alignment requirement and blocks transformation by
    the combine pass.

    A small benchmark:

    https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

    shows that broadcast is a little bit faster on Intel Core i7-8559U:

    $ make
    gcc -g -I. -O2   -c -o test.o test.c
    gcc -g   -c -o memory.o memory.S
    gcc -g   -c -o broadcast.o broadcast.S
    gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
    gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
    ./test
    memory      : 147215
    broadcast   : 121213
    vec_dup_sse2: 171366
    $

    broadcast is also smaller:

    $ size memory.o broadcast.o
       text    data     bss     dec     hex filename
        132       0       0     132      84 memory.o
        122       0       0     122      7a broadcast.o
    $

    3. Update PR 87767 tests to expect integer broadcast instead of broadcast
    from memory.
    4. Update avx512f_cond_move.c to expect integer broadcast.

    A small benchmark:

    https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

    shows that integer broadcast is faster than embedded memory broadcast:

    $ make
    gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
    gcc -g   -c -o memory.o memory.S
    gcc -g   -c -o broadcast.o broadcast.S
    gcc -o test test.o memory.o broadcast.o
    ./test
    memory      : 425538
    broadcast   : 375260
    $

    gcc/

            PR target/100865
            * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
            New prototype.
            (ix86_byte_broadcast): New function.
            (ix86_convert_const_wide_int_to_broadcast): Likewise.
            (ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
            size is 16 bytes or bigger.
            (ix86_broadcast_from_integer_constant): New function.
            (ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
            to broadcast if mode size is 16 bytes or bigger.
            * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
            prototype.
            * config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.

    gcc/testsuite/

            PR target/100865
            * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
            broadcast.
            * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/avx512f_cond_move.c: Also pass
            -mprefer-vector-width=512 and expect integer broadcast.
            * gcc.target/i386/pr100865-1.c: New test.
            * gcc.target/i386/pr100865-2.c: Likewise.
            * gcc.target/i386/pr100865-3.c: Likewise.
            * gcc.target/i386/pr100865-4a.c: Likewise.
            * gcc.target/i386/pr100865-4b.c: Likewise.
            * gcc.target/i386/pr100865-5a.c: Likewise.
            * gcc.target/i386/pr100865-5b.c: Likewise.
            * gcc.target/i386/pr100865-6a.c: Likewise.
            * gcc.target/i386/pr100865-6b.c: Likewise.
            * gcc.target/i386/pr100865-6c.c: Likewise.
            * gcc.target/i386/pr100865-7a.c: Likewise.
            * gcc.target/i386/pr100865-7b.c: Likewise.
            * gcc.target/i386/pr100865-7c.c: Likewise.
            * gcc.target/i386/pr100865-8a.c: Likewise.
            * gcc.target/i386/pr100865-8b.c: Likewise.
            * gcc.target/i386/pr100865-8c.c: Likewise.
            * gcc.target/i386/pr100865-9a.c: Likewise.
            * gcc.target/i386/pr100865-9b.c: Likewise.
            * gcc.target/i386/pr100865-9c.c: Likewise.
            * gcc.target/i386/pr100865-10a.c: Likewise.
            * gcc.target/i386/pr100865-10b.c: Likewise.
            * gcc.target/i386/pr100865-11a.c: Likewise.
            * gcc.target/i386/pr100865-11b.c: Likewise.
            * gcc.target/i386/pr100865-11c.c: Likewise.
            * gcc.target/i386/pr100865-12a.c: Likewise.
            * gcc.target/i386/pr100865-12b.c: Likewise.
            * gcc.target/i386/pr100865-12c.c: Likewise.

Reply via email to