On Thu, May 12, 2016 at 7:14 PM, Bernd Schmidt <bschm...@redhat.com> wrote: > On 05/02/2016 03:14 PM, Richard Biener wrote: > >> I think it fits best in tree-ssa-strlen.c:strlen_optimize_stmt for the >> moment. > > > I've done this in this version. Note that this apparently means it won't be > run at -O unlike the previous version. > > The code moved to tree-ssa-strlen.c is nearly identical to the previous > version, with the exception of a new line that copies the contents of > gimple_call_use_set to the newly constructed call. Without this, I found a > testcase where we can miss a DSE opportunity since we think memcmp_eq can > access anything. > > In the by_pieces_code, I've gone ahead and progressed the C++ conversion a > little further by making subclasses of op_by_pieces_d. Now the > {move,store,compare}_by_pieces functions generally just make an object of > the appropriate type and call its run method. I've also converted > store_by_pieces to this infrastructure to eliminate more duplication. > > I've verified that if you disable the strlenopt code, you get identical code > generation before and after the patch on x86_64-linux and arm-eabi. With the > strlenopt enabled, it finds memcmp optimization opportunities on both > targets (more on x86 due to !SLOW_UNALIGNED_ACCESS). On arm, there is a > question mark over whether the autoinc support in the by_pieces code is > really a good idea, but that's unchanged from before and can be addressed > later. > Microbenchmarks on x86 suggest that the new by-pieces expansion is a whole > lot faster than calling memcmp (about a factor of 3 in my tests). > > Bootstrapped and tested on x86_64-linux. The testcase failed at -O (now > changed to -O2), and there was a failure in vrp47.c which also seems to > appear in gcc-testresults, so I think it's unrelated. Still, I'll make > another run against a new baseline. Ok for trunk if that all comes back > clean?
@@ -6081,9 +6056,16 @@ expand_builtin (tree exp, rtx target, rt case BUILT_IN_BCMP: case BUILT_IN_MEMCMP: - target = expand_builtin_memcmp (exp, target); + case BUILT_IN_MEMCMP_EQ: + target = expand_builtin_memcmp (exp, target, fcode == BUILT_IN_MEMCMP_EQ); if (target) return target; + if (fcode == BUILT_IN_MEMCMP_EQ) + { + exp = copy_node (exp); + tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP); + TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl); + } you can modify 'exp' in-place (yeah, we still build a GENERIC call from GIMPLE just because nobody "fixed" downstream routines to accept a GIMPLE call). I'm not much of a fan of C++-ification (in this case it makes review harder) but well ... + if (tree_fits_uhwi_p (len) + && (leni = tree_to_uhwi (len)) <= GET_MODE_SIZE (word_mode) + && exact_log2 (leni) != -1 + && (align1 = get_pointer_alignment (arg1)) >= leni * BITS_PER_UNIT + && (align2 = get_pointer_alignment (arg2)) >= leni * BITS_PER_UNIT) I think * BITS_PER_UNIT has to be * 8 here as the C standard defines it that way. I think to increase coverage you want || ! SLOW_UNALIGNED_ACCESS (TYPE_MODE (), align1) here and build properly mis-aligned access types for the MEM_REF like if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type))) srctype = build_aligned_type (type, src_align); (copied from gimple_fold_builtin_memory_op). + type = build_nonstandard_integer_type (leni * BITS_PER_UNIT, 1); + gcc_assert (GET_MODE_SIZE (TYPE_MODE (type)) == leni); Likewise leni * 8 and the assert would have to assert GET_MODE_SIZE () * BITS_PER_UNIT == leni * 8 (or using GET_MODE_BITSIZE). + arg1 = build2_loc (loc, MEM_REF, type, arg1, off); + arg2 = build2_loc (loc, MEM_REF, type, arg2, off); you want to add tree tem = fold_const_aggregate_ref (arg1); if (tem) arg1 = tem; and same for arg2. That avoids leaving unfolded memory refs in the IL after the transform. + tree newfn = builtin_decl_explicit (BUILT_IN_MEMCMP_EQ); + gcall *repl = gimple_build_call (newfn, 3, arg1, arg2, len); + gimple_call_set_lhs (repl, res); + gimple_set_location (repl, gimple_location (stmt2)); + gimple_set_vuse (repl, gimple_vuse (stmt2)); + *gimple_call_use_set (repl) = *gimple_call_use_set (stmt2); + gcc_assert (!gimple_vdef (stmt2)); + gsi_replace (gsi, repl, false); + return false; I think you can avoid all this by just doing gimple_call_set_fndecl (stmt2, builtin_decl_explicit (BUILT_IN_MEMCMP_EQ)); as MEMCMP and MEMCMP_EQ have equal signatures. Ok with those changes. Thanks, Richard. > > Bernd