https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78348
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2016-11-15 Ever confirmed|0 |1 --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- > The issue is that memcpy must be produced instead of memove which does > not have optimized version for avx2 x86 and simply uses byte copy. I'd expected a if (! overlap) memcpy () else byte-copy at least. Note the loop distribution code doesn't try to be clever in choosing memcpy over memmove (using dependence analysis). So improving loop distribution (adding a PKIND_MEMMOVE and conservatively using that from dependence analysis) is a possibility as well. But we have (compute_affine_dependence stmt_a: _2 = par.0_1->x2[i_19][j_20]; stmt_b: par.0_1->x1[i_19][j_20] = _2; (analyze_overlapping_iterations (chrec_a = {0, +, 1}_2) (chrec_b = {0, +, 1}_2) (overlap_iterations_a = [0]) (overlap_iterations_b = [0])) (analyze_overlapping_iterations (chrec_a = i_19) (chrec_b = i_19) (overlap_iterations_a = [0]) (overlap_iterations_b = [0])) (analyze_overlapping_iterations (chrec_a = 33280) (chrec_b = 12800) (analyze_ziv_subscript ) (overlap_iterations_a = no dependence) (overlap_iterations_b = no dependence)) ) -> no dependence so I think we could use memcpy for all no dependence cases?