On Fri, 27 Jun 2014, Richard Biener wrote:

> On Fri, 27 Jun 2014, Jakub Jelinek wrote:
> 
> > On Fri, Jun 27, 2014 at 01:49:38PM +0200, Richard Biener wrote:
> > > I'm going to go for a single load/store and MOVE_MAX for now - I
> > > have quite some fallout to deal with anyway (analyzed strlenopt-1.c
> > > FAIL only, the other strlenopt cases are probably similar)
> > > 
> > > FAIL: gcc.dg/strlenopt-1.c scan-tree-dump-times strlen "strlen \\\\(" 2
> > > FAIL: gcc.dg/strlenopt-1.c scan-tree-dump-times strlen "memcpy \\\\(" 4
> > 
> > But is it really desirable to do this kind of expansion so early on GIMPLE?
> > Doing it during folding is e.g. very much harmful to offloading, the
> > offloading target might have different preferences from the host.
> > So, if it is really beneficial (do you have some benchmark that shows
> > that?), can it be done e.g. in the strlen pass or even somewhat later?
> 
> strlen pass now runs very very late, for PR61619 it is important
> before some constant propagation happening before vectorization.
> 
> But yes, it's not necessary doing that on GENERIC (nor is any of
> those foldings - but as said, it's not the objective of the patch
> to change where we do such optimizations).
> 
> Oh, and offloading targets always will have the issue facing
> IL optimized for another target (LOGICAL_OP_NON_SHORT_CIRCUIT
> for example).

Compromise "hack" below.  It simply avoids the transform for
sources that c_strlen can compute a length of.  That "fixes" all
strlenopt testcase apart from strlenopt-8.c which does
memcpy (, flag ? "a" : "b"); which then still folds
during gimplification.  I have XFAILed that.

I've tried to comb my way through strlenopt but failed to quickly
add support for generic stores (it has very rough support for
char stores, see also PR61773).

Does the below look ok?

Thanks,
Richard.

2014-07-10  Richard Biener  <rguent...@suse.de>

        PR middle-end/61473
        * builtins.c (fold_builtin_memory_op): Inline memory moves
        that can be implemented with a single load followed by a
        single store.

        * gcc.dg/memmove-4.c: New testcase.
        * gcc.dg/strlenopt-8.c: XFAIL.
        * gfortran.dg/coarray_lib_realloc_1.f90: Adjust.

Index: gcc/builtins.c
===================================================================
*** gcc/builtins.c.orig 2014-07-10 15:34:27.787477771 +0200
--- gcc/builtins.c      2014-07-10 16:20:08.071289106 +0200
*************** fold_builtin_memory_op (location_t loc,
*** 8637,8647 ****
        unsigned int src_align, dest_align;
        tree off0;
  
!       if (endp == 3)
        {
!         src_align = get_pointer_alignment (src);
!         dest_align = get_pointer_alignment (dest);
  
          /* Both DEST and SRC must be pointer types.
             ??? This is what old code did.  Is the testing for pointer types
             really mandatory?
--- 8637,8693 ----
        unsigned int src_align, dest_align;
        tree off0;
  
!       /* Build accesses at offset zero with a ref-all character type.  */
!       off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
!                                                        ptr_mode, true), 0);
! 
!       /* If we can perform the copy efficiently with first doing all loads
!          and then all stores inline it that way.  Currently efficiently
!        means that we can load all the memory into a single integer
!        register which is what MOVE_MAX gives us.  */
!       src_align = get_pointer_alignment (src);
!       dest_align = get_pointer_alignment (dest);
!       if (tree_fits_uhwi_p (len)
!         && compare_tree_int (len, MOVE_MAX) <= 0
!         /* ???  Don't transform copies from strings with known length this
!            confuses the tree-ssa-strlen.c.  This doesn't handle
!            the case in gcc.dg/strlenopt-8.c which is XFAILed for that
!            reason.  */
!         && !c_strlen (src, 1))
        {
!         unsigned ilen = tree_to_uhwi (len);
!         if (exact_log2 (ilen) != -1)
!           {
!             tree type = lang_hooks.types.type_for_size (ilen * 8, 1);
!             if (type
!                 && TYPE_MODE (type) != BLKmode
!                 && (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
!                     == ilen * 8)
!                 /* If the pointers are not aligned we must be able to
!                    emit an unaligned load.  */
!                 && ((src_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type))
!                      && dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type)))
!                     || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type),
!                                                MIN (src_align, dest_align))))
!               {
!                 tree srctype = type;
!                 tree desttype = type;
!                 if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
!                   srctype = build_aligned_type (type, src_align);
!                 if (dest_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
!                   desttype = build_aligned_type (type, dest_align);
!                 if (!ignore)
!                   dest = builtin_save_expr (dest);
!                 expr = build2 (MODIFY_EXPR, type,
!                                fold_build2 (MEM_REF, desttype, dest, off0),
!                                fold_build2 (MEM_REF, srctype, src, off0));
!                 goto done;
!               }
!           }
!       }
  
+       if (endp == 3)
+       {
          /* Both DEST and SRC must be pointer types.
             ??? This is what old code did.  Is the testing for pointer types
             really mandatory?
*************** fold_builtin_memory_op (location_t loc,
*** 8818,8827 ****
        if (!ignore)
          dest = builtin_save_expr (dest);
  
-       /* Build accesses at offset zero with a ref-all character type.  */
-       off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
-                                                        ptr_mode, true), 0);
- 
        destvar = dest;
        STRIP_NOPS (destvar);
        if (TREE_CODE (destvar) == ADDR_EXPR
--- 8864,8869 ----
*************** fold_builtin_memory_op (location_t loc,
*** 8888,8893 ****
--- 8930,8936 ----
        expr = build2 (MODIFY_EXPR, TREE_TYPE (destvar), destvar, srcvar);
      }
  
+ done:
    if (ignore)
      return expr;
  
Index: gcc/testsuite/gcc.dg/memmove-4.c
===================================================================
*** /dev/null   1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/memmove-4.c    2014-07-10 16:09:31.035332965 +0200
***************
*** 0 ****
--- 1,12 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-optimized" } */
+ 
+ typedef int w __attribute__((mode(word)));
+ 
+ void b(char *a, char *b, int i)
+ {
+   __builtin_memmove (&a[i], &b[i], sizeof(w));
+ }
+ 
+ /* { dg-final { scan-tree-dump-not "memmove" "optimized" { xfail { ! 
non_strict_align } } } } */
+ /* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/strlenopt-8.c
===================================================================
*** gcc/testsuite/gcc.dg/strlenopt-8.c.orig     2011-09-28 10:16:34.000000000 
+0200
--- gcc/testsuite/gcc.dg/strlenopt-8.c  2014-07-10 16:21:26.501283706 +0200
*************** main ()
*** 43,50 ****
    return 0;
  }
  
! /* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */
! /* { dg-final { scan-tree-dump-times "memcpy \\(" 4 "strlen" } } */
  /* { dg-final { scan-tree-dump-times "strcpy \\(" 0 "strlen" } } */
  /* { dg-final { scan-tree-dump-times "strcat \\(" 0 "strlen" } } */
  /* { dg-final { scan-tree-dump-times "strchr \\(" 0 "strlen" } } */
--- 43,50 ----
    return 0;
  }
  
! /* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" { xfail *-*-* } 
} } */
! /* { dg-final { scan-tree-dump-times "memcpy \\(" 4 "strlen" { xfail *-*-* } 
} } */
  /* { dg-final { scan-tree-dump-times "strcpy \\(" 0 "strlen" } } */
  /* { dg-final { scan-tree-dump-times "strcat \\(" 0 "strlen" } } */
  /* { dg-final { scan-tree-dump-times "strchr \\(" 0 "strlen" } } */
Index: gcc/testsuite/gfortran.dg/coarray_lib_realloc_1.f90
===================================================================
*** gcc/testsuite/gfortran.dg/coarray_lib_realloc_1.f90.orig    2013-08-27 
11:13:18.873996208 +0200
--- gcc/testsuite/gfortran.dg/coarray_lib_realloc_1.f90 2014-07-10 
16:24:00.310273117 +0200
*************** end
*** 30,35 ****
  ! { dg-final { scan-tree-dump-times "__builtin_malloc" 1 "original" } }
  
  ! But copy "ii" and "CAF":
! ! { dg-final { scan-tree-dump-times "__builtin_memcpy" 2 "original" } }
  
  ! { dg-final { cleanup-tree-dump "original" } }
--- 30,35 ----
  ! { dg-final { scan-tree-dump-times "__builtin_malloc" 1 "original" } }
  
  ! But copy "ii" and "CAF":
! ! { dg-final { scan-tree-dump-times "__builtin_memcpy|= MEM" 2 "original" } }
  
  ! { dg-final { cleanup-tree-dump "original" } }

Reply via email to