Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka wrote: > > > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > > > > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > > > > > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak > > > > > > wrote: > > > > > > > > > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu > > > > > > > wrote: > > > > > > > > > > > > > > > > Duplicate the cmpstrn pattern for cmpmem. The only difference > > > > > > > > is that > > > > > > > > the length argument of cmpmem is guaranteed to be less than or > > > > > > > > equal to > > > > > > > > lengths of 2 memory areas. Since "repz cmpsb" can be much > > > > > > > > slower than > > > > > > > > memcmp function implemented with vector instruction, see > > > > > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > > > > > > > > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only. > > > > > > > > > > > > > > If there is no benefit compared to the library implementation, > > > > > > > then > > > > > > > enable these patterns only when -minline-all-stringops is used. > > > > > > > > > > > > Fixed. > > > > > > > > > > > > > Eventually these should be reimplemented with SSE4 string > > > > > > > instructions. > > > > > > > > > > > > > > Honza is the author of the block handling x86 system, I'll leave > > > > > > > the > > > > > > > review to him. > > > > > > > > > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi. It was > > > > > > changed > > > > > > by > > > > > > > > > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f > > > > > > Author: Nick Clifton > > > > > > Date: Fri Aug 12 16:26:11 2011 + > > > > > > > > > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi > > > > > > pattern. > > > > > > > > > > > > * builtins.c (expand_builtin_memcmp): Do not use > > > > > > cmpstrnsi > > > > > > pattern. > > > > > > * doc/md.texi (cmpstrn): Note that the comparison stops > > > > > > if both > > > > > > fetched bytes are zero. > > > > > > (cmpstr): Likewise. > > > > > > (cmpmem): Note that the comparison does not stop if > > > > > > both of the > > > > > > fetched bytes are zero. > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151 > > > > > > > > > > > > is a regression. > > > > > > > > > > > > Honza, can you take a look at this? > > > > > > > > > > > > > > > > PING: > > > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html > > > > > > > > > > > > > PING. > > > > > > > > > > PING. > > > > > > > I'd like to check it in next Tuesday if there are no comments. > > I still plan to intorduce the two-level optimize_for_size predicates. > Will try to do that by tuesday. Any updates on this? My patch is still needed to generate cmpmemsi with -minline-all-stringops. If there are comments, I will check it in next Monday. -- H.J.
Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka wrote: > > > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > > > > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > > > > > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak > > > > > > wrote: > > > > > > > > > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu > > > > > > > wrote: > > > > > > > > > > > > > > > > Duplicate the cmpstrn pattern for cmpmem. The only difference > > > > > > > > is that > > > > > > > > the length argument of cmpmem is guaranteed to be less than or > > > > > > > > equal to > > > > > > > > lengths of 2 memory areas. Since "repz cmpsb" can be much > > > > > > > > slower than > > > > > > > > memcmp function implemented with vector instruction, see > > > > > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > > > > > > > > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only. > > > > > > > > > > > > > > If there is no benefit compared to the library implementation, > > > > > > > then > > > > > > > enable these patterns only when -minline-all-stringops is used. > > > > > > > > > > > > Fixed. > > > > > > > > > > > > > Eventually these should be reimplemented with SSE4 string > > > > > > > instructions. > > > > > > > > > > > > > > Honza is the author of the block handling x86 system, I'll leave > > > > > > > the > > > > > > > review to him. > > > > > > > > > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi. It was > > > > > > changed > > > > > > by > > > > > > > > > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f > > > > > > Author: Nick Clifton > > > > > > Date: Fri Aug 12 16:26:11 2011 + > > > > > > > > > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi > > > > > > pattern. > > > > > > > > > > > > * builtins.c (expand_builtin_memcmp): Do not use > > > > > > cmpstrnsi > > > > > > pattern. > > > > > > * doc/md.texi (cmpstrn): Note that the comparison stops > > > > > > if both > > > > > > fetched bytes are zero. > > > > > > (cmpstr): Likewise. > > > > > > (cmpmem): Note that the comparison does not stop if > > > > > > both of the > > > > > > fetched bytes are zero. > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151 > > > > > > > > > > > > is a regression. > > > > > > > > > > > > Honza, can you take a look at this? > > > > > > > > > > > > > > > > PING: > > > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html > > > > > > > > > > > > > PING. > > > > > > > > > > PING. > > > > > > > I'd like to check it in next Tuesday if there are no comments. > > I still plan to intorduce the two-level optimize_for_size predicates. > Will try to do that by tuesday. > Thanks. BTW, this patch is about inlining memcmp with -minline-all-stringops. It is very important for user interrupt codes (UINTR) not to call memcmp since memcmp in glibc uses vector registers which shouldn't be used in user interrupt codes. -- H.J.
Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops
> On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > > > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrote: > > > > > > > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu wrote: > > > > > > > > > > > > > > Duplicate the cmpstrn pattern for cmpmem. The only difference is > > > > > > > that > > > > > > > the length argument of cmpmem is guaranteed to be less than or > > > > > > > equal to > > > > > > > lengths of 2 memory areas. Since "repz cmpsb" can be much slower > > > > > > > than > > > > > > > memcmp function implemented with vector instruction, see > > > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > > > > > > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only. > > > > > > > > > > > > If there is no benefit compared to the library implementation, then > > > > > > enable these patterns only when -minline-all-stringops is used. > > > > > > > > > > Fixed. > > > > > > > > > > > Eventually these should be reimplemented with SSE4 string > > > > > > instructions. > > > > > > > > > > > > Honza is the author of the block handling x86 system, I'll leave the > > > > > > review to him. > > > > > > > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi. It was > > > > > changed > > > > > by > > > > > > > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f > > > > > Author: Nick Clifton > > > > > Date: Fri Aug 12 16:26:11 2011 + > > > > > > > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern. > > > > > > > > > > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi > > > > > pattern. > > > > > * doc/md.texi (cmpstrn): Note that the comparison stops > > > > > if both > > > > > fetched bytes are zero. > > > > > (cmpstr): Likewise. > > > > > (cmpmem): Note that the comparison does not stop if both > > > > > of the > > > > > fetched bytes are zero. > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151 > > > > > > > > > > is a regression. > > > > > > > > > > Honza, can you take a look at this? > > > > > > > > > > > > > PING: > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html > > > > > > > > > > PING. > > > > > > > PING. > > > > I'd like to check it in next Tuesday if there are no comments. I still plan to intorduce the two-level optimize_for_size predicates. Will try to do that by tuesday. Honza > > -- > H.J.
Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops
On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu wrote: > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrote: > > > > > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu wrote: > > > > > > > > > > > > Duplicate the cmpstrn pattern for cmpmem. The only difference is > > > > > > that > > > > > > the length argument of cmpmem is guaranteed to be less than or > > > > > > equal to > > > > > > lengths of 2 memory areas. Since "repz cmpsb" can be much slower > > > > > > than > > > > > > memcmp function implemented with vector instruction, see > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > > > > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only. > > > > > > > > > > If there is no benefit compared to the library implementation, then > > > > > enable these patterns only when -minline-all-stringops is used. > > > > > > > > Fixed. > > > > > > > > > Eventually these should be reimplemented with SSE4 string > > > > > instructions. > > > > > > > > > > Honza is the author of the block handling x86 system, I'll leave the > > > > > review to him. > > > > > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi. It was changed > > > > by > > > > > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f > > > > Author: Nick Clifton > > > > Date: Fri Aug 12 16:26:11 2011 + > > > > > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern. > > > > > > > > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi > > > > pattern. > > > > * doc/md.texi (cmpstrn): Note that the comparison stops if > > > > both > > > > fetched bytes are zero. > > > > (cmpstr): Likewise. > > > > (cmpmem): Note that the comparison does not stop if both of > > > > the > > > > fetched bytes are zero. > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151 > > > > > > > > is a regression. > > > > > > > > Honza, can you take a look at this? > > > > > > > > > > PING: > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html > > > > > > > PING. > > > > PING. > I'd like to check it in next Tuesday if there are no comments. -- H.J.
PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops
On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu wrote: > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu wrote: > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu wrote: > > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak wrote: > > > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu wrote: > > > > > > > > > > Duplicate the cmpstrn pattern for cmpmem. The only difference is that > > > > > the length argument of cmpmem is guaranteed to be less than or equal > > > > > to > > > > > lengths of 2 memory areas. Since "repz cmpsb" can be much slower than > > > > > memcmp function implemented with vector instruction, see > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only. > > > > > > > > If there is no benefit compared to the library implementation, then > > > > enable these patterns only when -minline-all-stringops is used. > > > > > > Fixed. > > > > > > > Eventually these should be reimplemented with SSE4 string instructions. > > > > > > > > Honza is the author of the block handling x86 system, I'll leave the > > > > review to him. > > > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi. It was changed > > > by > > > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f > > > Author: Nick Clifton > > > Date: Fri Aug 12 16:26:11 2011 + > > > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern. > > > > > > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi > > > pattern. > > > * doc/md.texi (cmpstrn): Note that the comparison stops if > > > both > > > fetched bytes are zero. > > > (cmpstr): Likewise. > > > (cmpmem): Note that the comparison does not stop if both of > > > the > > > fetched bytes are zero. > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151 > > > > > > is a regression. > > > > > > Honza, can you take a look at this? > > > > > > > PING: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html > > > > PING. > PING. -- H.J.