Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-23 Thread H.J. Lu via Gcc-patches
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka  wrote:
>
> > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu  wrote:
> > >
> > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu  wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu  wrote:
> > > > > >
> > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Duplicate the cmpstrn pattern for cmpmem.  The only difference 
> > > > > > > > is that
> > > > > > > > the length argument of cmpmem is guaranteed to be less than or 
> > > > > > > > equal to
> > > > > > > > lengths of 2 memory areas.  Since "repz cmpsb" can be much 
> > > > > > > > slower than
> > > > > > > > memcmp function implemented with vector instruction, see
> > > > > > > >
> > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > > > > > >
> > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only.
> > > > > > >
> > > > > > > If there is no benefit compared to the library implementation, 
> > > > > > > then
> > > > > > > enable these patterns only when -minline-all-stringops is used.
> > > > > >
> > > > > > Fixed.
> > > > > >
> > > > > > > Eventually these should be reimplemented with SSE4 string 
> > > > > > > instructions.
> > > > > > >
> > > > > > > Honza is the author of the block handling x86 system, I'll leave 
> > > > > > > the
> > > > > > > review to him.
> > > > > >
> > > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was 
> > > > > > changed
> > > > > > by
> > > > > >
> > > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> > > > > > Author: Nick Clifton 
> > > > > > Date:   Fri Aug 12 16:26:11 2011 +
> > > > > >
> > > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi 
> > > > > > pattern.
> > > > > >
> > > > > > * builtins.c (expand_builtin_memcmp): Do not use 
> > > > > > cmpstrnsi
> > > > > > pattern.
> > > > > > * doc/md.texi (cmpstrn): Note that the comparison stops 
> > > > > > if both
> > > > > > fetched bytes are zero.
> > > > > > (cmpstr): Likewise.
> > > > > > (cmpmem): Note that the comparison does not stop if 
> > > > > > both of the
> > > > > > fetched bytes are zero.
> > > > > >
> > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151
> > > > > >
> > > > > > is a regression.
> > > > > >
> > > > > > Honza, can you take a look at this?
> > > > > >
> > > > >
> > > > > PING:
> > > > >
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html
> > > > >
> > > >
> > > > PING.
> > > >
> > >
> > > PING.
> > >
> >
> > I'd like to check it in next Tuesday if there are no comments.
>
> I still plan to intorduce the two-level optimize_for_size predicates.
> Will try to do that by tuesday.

Any updates on this?  My patch is still needed to generate cmpmemsi
with -minline-all-stringops.  If there are comments, I will check it in next
Monday.

--
H.J.


Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-18 Thread H.J. Lu via Gcc-patches
On Sun, Oct 18, 2020 at 8:16 AM Jan Hubicka  wrote:
>
> > On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu  wrote:
> > >
> > > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu  wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu  wrote:
> > > > > >
> > > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Duplicate the cmpstrn pattern for cmpmem.  The only difference 
> > > > > > > > is that
> > > > > > > > the length argument of cmpmem is guaranteed to be less than or 
> > > > > > > > equal to
> > > > > > > > lengths of 2 memory areas.  Since "repz cmpsb" can be much 
> > > > > > > > slower than
> > > > > > > > memcmp function implemented with vector instruction, see
> > > > > > > >
> > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > > > > > >
> > > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only.
> > > > > > >
> > > > > > > If there is no benefit compared to the library implementation, 
> > > > > > > then
> > > > > > > enable these patterns only when -minline-all-stringops is used.
> > > > > >
> > > > > > Fixed.
> > > > > >
> > > > > > > Eventually these should be reimplemented with SSE4 string 
> > > > > > > instructions.
> > > > > > >
> > > > > > > Honza is the author of the block handling x86 system, I'll leave 
> > > > > > > the
> > > > > > > review to him.
> > > > > >
> > > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was 
> > > > > > changed
> > > > > > by
> > > > > >
> > > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> > > > > > Author: Nick Clifton 
> > > > > > Date:   Fri Aug 12 16:26:11 2011 +
> > > > > >
> > > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi 
> > > > > > pattern.
> > > > > >
> > > > > > * builtins.c (expand_builtin_memcmp): Do not use 
> > > > > > cmpstrnsi
> > > > > > pattern.
> > > > > > * doc/md.texi (cmpstrn): Note that the comparison stops 
> > > > > > if both
> > > > > > fetched bytes are zero.
> > > > > > (cmpstr): Likewise.
> > > > > > (cmpmem): Note that the comparison does not stop if 
> > > > > > both of the
> > > > > > fetched bytes are zero.
> > > > > >
> > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151
> > > > > >
> > > > > > is a regression.
> > > > > >
> > > > > > Honza, can you take a look at this?
> > > > > >
> > > > >
> > > > > PING:
> > > > >
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html
> > > > >
> > > >
> > > > PING.
> > > >
> > >
> > > PING.
> > >
> >
> > I'd like to check it in next Tuesday if there are no comments.
>
> I still plan to intorduce the two-level optimize_for_size predicates.
> Will try to do that by tuesday.
>

Thanks.

BTW, this patch is about inlining memcmp with -minline-all-stringops.
It is very important for user interrupt codes (UINTR) not to call memcmp
since memcmp in glibc uses vector registers which shouldn't be used in
user interrupt codes.

-- 
H.J.


Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-18 Thread Jan Hubicka
> On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu  wrote:
> >
> > On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu  wrote:
> > > >
> > > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu  wrote:
> > > > >
> > > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > Duplicate the cmpstrn pattern for cmpmem.  The only difference is 
> > > > > > > that
> > > > > > > the length argument of cmpmem is guaranteed to be less than or 
> > > > > > > equal to
> > > > > > > lengths of 2 memory areas.  Since "repz cmpsb" can be much slower 
> > > > > > > than
> > > > > > > memcmp function implemented with vector instruction, see
> > > > > > >
> > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > > > > >
> > > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only.
> > > > > >
> > > > > > If there is no benefit compared to the library implementation, then
> > > > > > enable these patterns only when -minline-all-stringops is used.
> > > > >
> > > > > Fixed.
> > > > >
> > > > > > Eventually these should be reimplemented with SSE4 string 
> > > > > > instructions.
> > > > > >
> > > > > > Honza is the author of the block handling x86 system, I'll leave the
> > > > > > review to him.
> > > > >
> > > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was 
> > > > > changed
> > > > > by
> > > > >
> > > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> > > > > Author: Nick Clifton 
> > > > > Date:   Fri Aug 12 16:26:11 2011 +
> > > > >
> > > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern.
> > > > >
> > > > > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi
> > > > > pattern.
> > > > > * doc/md.texi (cmpstrn): Note that the comparison stops 
> > > > > if both
> > > > > fetched bytes are zero.
> > > > > (cmpstr): Likewise.
> > > > > (cmpmem): Note that the comparison does not stop if both 
> > > > > of the
> > > > > fetched bytes are zero.
> > > > >
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151
> > > > >
> > > > > is a regression.
> > > > >
> > > > > Honza, can you take a look at this?
> > > > >
> > > >
> > > > PING:
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html
> > > >
> > >
> > > PING.
> > >
> >
> > PING.
> >
> 
> I'd like to check it in next Tuesday if there are no comments.

I still plan to intorduce the two-level optimize_for_size predicates.
Will try to do that by tuesday.

Honza
> 
> -- 
> H.J.


Re: PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-17 Thread H.J. Lu via Gcc-patches
On Fri, Oct 2, 2020 at 6:21 AM H.J. Lu  wrote:
>
> On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu  wrote:
> >
> > On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu  wrote:
> > >
> > > On Tue, May 19, 2020 at 5:14 AM H.J. Lu  wrote:
> > > >
> > > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak  wrote:
> > > > >
> > > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu  wrote:
> > > > > >
> > > > > > Duplicate the cmpstrn pattern for cmpmem.  The only difference is 
> > > > > > that
> > > > > > the length argument of cmpmem is guaranteed to be less than or 
> > > > > > equal to
> > > > > > lengths of 2 memory areas.  Since "repz cmpsb" can be much slower 
> > > > > > than
> > > > > > memcmp function implemented with vector instruction, see
> > > > > >
> > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > > > >
> > > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only.
> > > > >
> > > > > If there is no benefit compared to the library implementation, then
> > > > > enable these patterns only when -minline-all-stringops is used.
> > > >
> > > > Fixed.
> > > >
> > > > > Eventually these should be reimplemented with SSE4 string 
> > > > > instructions.
> > > > >
> > > > > Honza is the author of the block handling x86 system, I'll leave the
> > > > > review to him.
> > > >
> > > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was changed
> > > > by
> > > >
> > > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> > > > Author: Nick Clifton 
> > > > Date:   Fri Aug 12 16:26:11 2011 +
> > > >
> > > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern.
> > > >
> > > > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi
> > > > pattern.
> > > > * doc/md.texi (cmpstrn): Note that the comparison stops if 
> > > > both
> > > > fetched bytes are zero.
> > > > (cmpstr): Likewise.
> > > > (cmpmem): Note that the comparison does not stop if both of 
> > > > the
> > > > fetched bytes are zero.
> > > >
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151
> > > >
> > > > is a regression.
> > > >
> > > > Honza, can you take a look at this?
> > > >
> > >
> > > PING:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html
> > >
> >
> > PING.
> >
>
> PING.
>

I'd like to check it in next Tuesday if there are no comments.

-- 
H.J.


PING^3 [PATCH] x86: Add cmpmemsi for -minline-all-stringops

2020-10-02 Thread H.J. Lu via Gcc-patches
On Wed, Sep 16, 2020 at 10:07 PM H.J. Lu  wrote:
>
> On Wed, Aug 19, 2020 at 6:09 AM H.J. Lu  wrote:
> >
> > On Tue, May 19, 2020 at 5:14 AM H.J. Lu  wrote:
> > >
> > > On Tue, May 19, 2020 at 1:48 AM Uros Bizjak  wrote:
> > > >
> > > > On Sun, May 17, 2020 at 7:06 PM H.J. Lu  wrote:
> > > > >
> > > > > Duplicate the cmpstrn pattern for cmpmem.  The only difference is that
> > > > > the length argument of cmpmem is guaranteed to be less than or equal 
> > > > > to
> > > > > lengths of 2 memory areas.  Since "repz cmpsb" can be much slower than
> > > > > memcmp function implemented with vector instruction, see
> > > > >
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > > >
> > > > > expand cmpmem to "repz cmpsb" only with -mgeneral-regs-only.
> > > >
> > > > If there is no benefit compared to the library implementation, then
> > > > enable these patterns only when -minline-all-stringops is used.
> > >
> > > Fixed.
> > >
> > > > Eventually these should be reimplemented with SSE4 string instructions.
> > > >
> > > > Honza is the author of the block handling x86 system, I'll leave the
> > > > review to him.
> > >
> > > We used to expand memcmp to "repz cmpsb" via cmpstrnsi.  It was changed
> > > by
> > >
> > > commit 9b0f6f5e511ca512e4faeabc81d2fd3abad9b02f
> > > Author: Nick Clifton 
> > > Date:   Fri Aug 12 16:26:11 2011 +
> > >
> > > builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi pattern.
> > >
> > > * builtins.c (expand_builtin_memcmp): Do not use cmpstrnsi
> > > pattern.
> > > * doc/md.texi (cmpstrn): Note that the comparison stops if 
> > > both
> > > fetched bytes are zero.
> > > (cmpstr): Likewise.
> > > (cmpmem): Note that the comparison does not stop if both of 
> > > the
> > > fetched bytes are zero.
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95151
> > >
> > > is a regression.
> > >
> > > Honza, can you take a look at this?
> > >
> >
> > PING:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546921.html
> >
>
> PING.
>

PING.

-- 
H.J.