On 6/1/2021 7:29 AM, H.J. Lu via Gcc-patches wrote:
On Tue, Jun 1, 2021 at 6:25 AM Richard Biener
<richard.guent...@gmail.com> wrote:
On Tue, Jun 1, 2021 at 3:05 PM H.J. Lu <hjl.to...@gmail.com> wrote:
On Mon, May 31, 2021 at 11:54:53PM -0600, Jeff Law wrote:

On 5/31/2021 11:50 PM, Richard Sandiford wrote:
"H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
On Mon, May 31, 2021 at 06:32:04AM -0700, H.J. Lu wrote:
On Mon, May 31, 2021 at 6:26 AM Richard Biener
<richard.guent...@gmail.com> wrote:
On Mon, May 31, 2021 at 3:12 PM H.J. Lu <hjl.to...@gmail.com> wrote:
On Mon, May 31, 2021 at 5:46 AM Richard Biener
<richard.guent...@gmail.com> wrote:
On Mon, May 31, 2021 at 2:09 PM H.J. Lu <hjl.to...@gmail.com> wrote:
On Wed, May 26, 2021 at 10:28:16AM +0200, Richard Biener wrote:
   -- Target Hook: rtx TARGET_GEN_MEMSET_VALUE (rtx DATA, scalar_int_mode
            MODE)
       This function returns the RTL of a register containing
       'GET_MODE_SIZE (MODE)' consecutive copies of the unsigned char
       value given in the RTL register DATA.  For example, if MODE is 4
       bytes wide, return the RTL for 0x01010101*DATA.
For this one I wonder if it should be an optab instead.  Couldn't you
use the existing vec_duplicate for this by using (paradoxical) subregs
like (subreg:TI (vec_duplicate:VnQI (subreg:VnQI (reg:QI ...)))?
I tried.   It doesn't even work on x86.  See:

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570661.html
Not sure what I should read from there...

There are special cases to subreg HI, SI and DI modes of TI mode in
ix86_gen_memset_value_from_prev.   simplify_gen_subreg doesn't
work here.   Each backend may need its own special handling.
OK, I guess I'm not (RTL) qualified enough to further review these parts,
sorry.  Since we're doing code generation the canonical way to communicate
with backends should be optabs, not some set of disconnected target hooks.
But as said, I probably don't know enough of RTL to see why it's the only way.

Richard.
Here is the patch to add optabs instead.  Does it look OK?

Thanks.

H.J.
---
Add 2 optabs:

1. integer_extract: Extract lower bit value from the integer value in
TImode, OImode or XImode.
That sounds very specific, esp. the restriction to {TI,OI,XI}mode.
It also sounds like it matches (subreg:{TI,OI,XI} (...) 0).  There are
existing target hooks verifying subreg validity - why's that not a good
fit here?  ISTR you say gen_lowpart () doesn't work (or was it
simplify_gen_subreg?), why's that so?
{TI,OI,XI}mode are storage only integer types.   subreg doesn't work
well on them.  I got

[hjl@gnu-cfl-2 pieces]$ cat s2.i
extern void *ops;

void
foo (int c)
{
    __builtin_memset (ops, c, 34);
}
[hjl@gnu-cfl-2 pieces]$ make s2.s
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/
-O2 -march=haswell -S s2.i
during RTL pass: reload
s2.i: In function ‘foo’:
s2.i:7:1: internal compiler error: maximum number of generated reload
insns per insn achieved (90)
      7 | }
        | ^
0x1050734 lra_constraints(bool)
/export/gnu/import/git/gitlab/x86-gcc/gcc/lra-constraints.c:5091
0x1039536 lra(_IO_FILE*)
/export/gnu/import/git/gitlab/x86-gcc/gcc/lra.c:2336
0xfe1140 do_reload
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5822
0xfe162e execute
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:6008
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make: *** [Makefile:32: s2.s] Error 1
[hjl@gnu-cfl-2 pieces]$

due to

(insn 12 11 0 (set (mem:HI (plus:DI (reg/f:DI 84)
                  (const_int 32 [0x20])) [0 MEM <char[1:34]> [(void
*)ops.0_1]+32 S2 A8])
          (subreg:HI (reg:OI 51 xmm15) 0)) "s2.i":6:3 -1
       (nil))

The new optab gives us

(insn 12 11 13 2 (set (reg:TI 88)
          (reg:TI 51 xmm15)) "s2.i":6:3 -1
       (nil))
(insn 13 12 14 2 (set (reg:SI 89)
          (subreg:SI (reg:TI 88) 0)) "s2.i":6:3 -1
       (nil))
(insn 14 13 15 2 (set (reg:HI 87)
          (subreg:HI (reg:SI 89) 0)) "s2.i":6:3 -1
       (nil))
that looks odd to me - what's the final result after LRA?  I think
I got:

vmovd %edi, %xmm15
movq ops(%rip), %rdx
vpbroadcastb %xmm15, %ymm15
vmovq %xmm15, %rax    <<<< move to GPR
vmovdqu %ymm15, (%rdx)
movw %ax, 32(%rdx)   <<<< subreg of GPR
vzeroupper
ret

we should see to make lowpart_subreg work on {XI,OI,TI}mode.
Only two steps should be necessary at most:
xmm -> gpr, grp -> subreg, or gpr -> subreg.  So the expander
code in memset should try to generate the subreg directly
subreg didn't fail on x86 when I tried.

and if that fails, try a word_mode subreg followed by the subreg.
I will try word_mode subreg.

Here is the v2 patch to use word_mode subreg.  For

---
extern void *ops;

void
foo (int c)
{
    __builtin_memset (ops, 4, 32);
}
---

without vec_const_duplicate, I got

   movl    $4, %eax
   movq    ops(%rip), %rdx
   movd    %eax, %xmm0
   punpcklbw       %xmm0, %xmm0
   punpcklwd       %xmm0, %xmm0
   pshufd  $0, %xmm0, %xmm0
   movups  %xmm0, (%rdx)
   movups  %xmm0, 16(%rdx)
   ret

With vec_const_duplicate, I got

   movq    ops(%rip), %rax
   movdqa  .LC0(%rip), %xmm0
   movups  %xmm0, (%rax)
   movups  %xmm0, 16(%rax)
   ret

If vec_duplicate is allowed to fail, I don't need vec_const_duplicate.
I don't understand why we need an optab for this though.  If the operand
is constant then we should just be doing an ordinary move in which the
source is a CONST_VECTOR.  It's then up to the move patterns to handle
duplicated constants as efficiently as possible.  (Sorry if this was
discussed upthread and I missed it.)
That's exactly the point I'm trying to get across as well.

This is what we do today.  But I'd like to generate

         movl    $4, %eax
         vpbroadcastb    %eax, %ymm15
         movq    ops(%rip), %rax
         vmovdqu %ymm15, (%rax)
         vzeroupper
         ret

instead of

         vmovdqa .LC0(%rip), %ymm15
         movq    ops(%rip), %rax
         vmovdqu %ymm15, (%rax)
         vzeroupper
         ret

Do I need a vec_dup pattern for it?
I think we have special code sequences to materialize some
constant vectors already, we should be able to add to that, no?
We can do that for all 0s and all 1s at the final codegen.   For
other values, since we need a GPR, we can't do that.
You can catch them in your movxx expanders, you can create peep2 patterns that use available GPRs, etc.  I don't see a fundamental need to to introduce new target macros or hooks to handle this stuff.  In fact I've done both to handle a closely related issue on our port.

jeff

Reply via email to