https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Kewen Lin from comment #7) > Two questions in mind, need to dig into it further: > 1) from the assembly of scalar/vector code, I don't see any stores needed > into temp array d (array diff in pixel_sub_wxh), but when modeling we > consider the stores. Because when modeling they are still there. There's no good way around this. > On Power two vector stores take cost 2 while 16 scalar > stores takes cost 16, it seems wrong to cost model something useless. Later, > for the vector version we need 16 vector halfword extractions from these two > halfword vectors, while scalar version the values are just in GPR register, > vector version looks inefficient. > 2) on Power, the conversion from unsigned char to unsigned short is nop > conversion, when we counting scalar cost, it's counted, then add costs 32 > totally onto scalar cost. Meanwhile, the conversion from unsigned short to > signed short should be counted but it's not (need to check why further). > The nop conversion costing looks something we can handle in function > rs6000_adjust_vect_cost_per_stmt, I tried to use the generic function > tree_nop_conversion_p, but it's only for same mode/precision conversion. > Will find/check something else.