Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
Hi Haochen, on 2024/1/11 16:28, HAO CHEN GUI wrote: > Hi, > This patch eliminates unnecessary byte swaps for block clear on P8 > LE. For block clear, all the bytes are set to zero. The byte order > doesn't make sense. So the alignment of destination could be set to > the store mode size in stead of 1 byte in order to eliminates > unnecessary byte swap instructions on P8 LE. The test case shows the > problem. I agree with Richi's concern, a bytes swap can be eliminated if the bytes swapped result is known as before, one typical case is the vector constant with predicate const_vector_each_byte_same, we can do some optimization for that. BR, Kewen > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE > > gcc/ > PR target/113325 > * config/rs6000/rs6000-string.cc (expand_block_clear): Set the > alignment of destination to the size of mode. > > gcc/testsuite/ > PR target/113325 > * gcc.target/powerpc/pr113325.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-string.cc > b/gcc/config/rs6000/rs6000-string.cc > index 7f777666ba9..4c9b2cbeefc 100644 > --- a/gcc/config/rs6000/rs6000-string.cc > +++ b/gcc/config/rs6000/rs6000-string.cc > @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) > } > >dest = adjust_address (orig_dest, mode, offset); > - > + /* Set the alignment of dest to the size of mode in order to > + avoid unnecessary byte swaps on LE. */ > + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); >emit_move_insn (dest, CONST0_RTX (mode)); > } > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c > b/gcc/testsuite/gcc.target/powerpc/pr113325.c > new file mode 100644 > index 000..4a3cae019c2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ > + > +void* foo (void* s1) > +{ > + return __builtin_memset (s1, 0, 32); > +}
Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
On Fri, Jan 12, 2024 at 3:15 AM HAO CHEN GUI wrote: > > Hi Richard, >Thanks so much for your comments. > > > >> patch.diff > >> diff --git a/gcc/config/rs6000/rs6000-string.cc > >> b/gcc/config/rs6000/rs6000-string.cc > >> index 7f777666ba9..4c9b2cbeefc 100644 > >> --- a/gcc/config/rs6000/rs6000-string.cc > >> +++ b/gcc/config/rs6000/rs6000-string.cc > >> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) > >> } > >> > >>dest = adjust_address (orig_dest, mode, offset); > >> - > >> + /* Set the alignment of dest to the size of mode in order to > >> +avoid unnecessary byte swaps on LE. */ > >> + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); > > > > but the alignment is now wrong which might cause ripple-down > > wrong-code effects, no? > > > > It's probably bad to hide the byte-swapping in the move patterns (I'm > > just guessing > > you do that) > > Here I just change the alignment of "dest" which is temporary used for > move. The orig_dest is untouched and keep the original alignment. The > subsequent insns which use orig_dest are not affected. I am not sure if > it causes ripple-down effects. Do you mean the dest might be reused > later? But I think the alignment is different even though the mode and > offset is the same. If the MEM ends up in the IL then its MEM_ALIGN should be better correct. > Looking forward to your advice. > > Thanks > Gui Haochen
Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
Hi Richard, Thanks so much for your comments. >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000-string.cc >> b/gcc/config/rs6000/rs6000-string.cc >> index 7f777666ba9..4c9b2cbeefc 100644 >> --- a/gcc/config/rs6000/rs6000-string.cc >> +++ b/gcc/config/rs6000/rs6000-string.cc >> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) >> } >> >>dest = adjust_address (orig_dest, mode, offset); >> - >> + /* Set the alignment of dest to the size of mode in order to >> +avoid unnecessary byte swaps on LE. */ >> + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); > > but the alignment is now wrong which might cause ripple-down > wrong-code effects, no? > > It's probably bad to hide the byte-swapping in the move patterns (I'm > just guessing > you do that) Here I just change the alignment of "dest" which is temporary used for move. The orig_dest is untouched and keep the original alignment. The subsequent insns which use orig_dest are not affected. I am not sure if it causes ripple-down effects. Do you mean the dest might be reused later? But I think the alignment is different even though the mode and offset is the same. Looking forward to your advice. Thanks Gui Haochen
Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
On Thu, Jan 11, 2024 at 9:30 AM HAO CHEN GUI wrote: > > Hi, > This patch eliminates unnecessary byte swaps for block clear on P8 > LE. For block clear, all the bytes are set to zero. The byte order > doesn't make sense. So the alignment of destination could be set to > the store mode size in stead of 1 byte in order to eliminates > unnecessary byte swap instructions on P8 LE. The test case shows the > problem. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE > > gcc/ > PR target/113325 > * config/rs6000/rs6000-string.cc (expand_block_clear): Set the > alignment of destination to the size of mode. > > gcc/testsuite/ > PR target/113325 > * gcc.target/powerpc/pr113325.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-string.cc > b/gcc/config/rs6000/rs6000-string.cc > index 7f777666ba9..4c9b2cbeefc 100644 > --- a/gcc/config/rs6000/rs6000-string.cc > +++ b/gcc/config/rs6000/rs6000-string.cc > @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) > } > >dest = adjust_address (orig_dest, mode, offset); > - > + /* Set the alignment of dest to the size of mode in order to > +avoid unnecessary byte swaps on LE. */ > + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); but the alignment is now wrong which might cause ripple-down wrong-code effects, no? It's probably bad to hide the byte-swapping in the move patterns (I'm just guessing you do that) >emit_move_insn (dest, CONST0_RTX (mode)); > } > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c > b/gcc/testsuite/gcc.target/powerpc/pr113325.c > new file mode 100644 > index 000..4a3cae019c2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ > + > +void* foo (void* s1) > +{ > + return __builtin_memset (s1, 0, 32); > +}
[Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
Hi, This patch eliminates unnecessary byte swaps for block clear on P8 LE. For block clear, all the bytes are set to zero. The byte order doesn't make sense. So the alignment of destination could be set to the store mode size in stead of 1 byte in order to eliminates unnecessary byte swap instructions on P8 LE. The test case shows the problem. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE gcc/ PR target/113325 * config/rs6000/rs6000-string.cc (expand_block_clear): Set the alignment of destination to the size of mode. gcc/testsuite/ PR target/113325 * gcc.target/powerpc/pr113325.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 7f777666ba9..4c9b2cbeefc 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) } dest = adjust_address (orig_dest, mode, offset); - + /* Set the alignment of dest to the size of mode in order to +avoid unnecessary byte swaps on LE. */ + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); emit_move_insn (dest, CONST0_RTX (mode)); } diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c new file mode 100644 index 000..4a3cae019c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ + +void* foo (void* s1) +{ + return __builtin_memset (s1, 0, 32); +}