https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39821
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #8) > I've pushed the change that makes us run into ix86_multiplication_cost but > as said that doesn't differentiate highpart or widening multiply yet and > thus we're now missing optimizations because of too conservative costing. For MULT_HIGHPART_EXPR, x86 only have pmulhw, it's probably ok to go into ix86_multiplication_cost. For WIDEN_MULT_EXPR, we need a separate cost function which should also accept sign info since we have pmuludq under sse2 but pmuldq under sse4.1. .i.e we should vectorize udotproduct under sse2, but sdotprodoct under sse4.1 #include<stdint.h> uint64_t udotproduct(uint32_t *v1, uint32_t *v2, int order) { uint64_t accum = 0; while (order--) accum += (uint64_t) *v1++ * *v2++; return accum; } #include<stdint.h> int64_t sdotproduct(int32_t *v1, int32_t *v2, int order) { int64_t accum = 0; while (order--) accum += (int64_t) *v1++ * *v2++; return accum; }