nnet-test after r242038

rguenth at gcc dot gnu.org Tue, 15 Nov 2016 03:51:06 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78348


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2016-11-15
     Ever confirmed|0                           |1

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
> The issue is that memcpy must be produced instead of memove which does
> not have optimized version for avx2 x86 and simply uses byte copy.

I'd expected a if (! overlap) memcpy () else byte-copy at least.

Note the loop distribution code doesn't try to be clever in choosing memcpy
over memmove (using dependence analysis).  So improving loop distribution
(adding a PKIND_MEMMOVE and conservatively using that from dependence analysis)
is a possibility as well.  But we have

(compute_affine_dependence
  stmt_a: _2 = par.0_1->x2[i_19][j_20];
  stmt_b: par.0_1->x1[i_19][j_20] = _2;
(analyze_overlapping_iterations
  (chrec_a = {0, +, 1}_2)
  (chrec_b = {0, +, 1}_2)
  (overlap_iterations_a = [0])
  (overlap_iterations_b = [0]))
(analyze_overlapping_iterations
  (chrec_a = i_19)
  (chrec_b = i_19)
  (overlap_iterations_a = [0])
  (overlap_iterations_b = [0]))
(analyze_overlapping_iterations
  (chrec_a = 33280)
  (chrec_b = 12800)
(analyze_ziv_subscript
)
  (overlap_iterations_a = no dependence)
  (overlap_iterations_b = no dependence))
) -> no dependence

so I think we could use memcpy for all no dependence cases?

[Bug tree-optimization/78348] [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038

Reply via email to