bruno added a comment.


> As you can see, the type legalizer handle vec3 load/store properly. It does 
> not write 4th element. The vec3 load/store generates more instructions but it 
> has correct behavior. I am not 100% sure the vec3 --> vec4 load/store is 
> correct or not because no one has complained about it.  But if the vec3 --> 
> vec4 load/store is correct, llvm's type legalizer or somewhere on llvm's 
> codegen could follow the approach too to generate optimal code.

Thanks for the nice investigation/explanation on amdgcn.

> As a result, I think it would be good for clang to have both of features and 
> I would like to stick to the option "-fpresereve-vec3' to change the behavior 
> easily.

The motivation doesn't seem solid to me, who else is going to benefit from this 
flag? You also didn't explain why doing this transformation yourself (looking 
through the shuffle) on your downstream pass isn't enough for you. We generally 
try to avoid adding flags if not for a good reason.


https://reviews.llvm.org/D30810



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to