David:
Well the intersting question is, why is it slower? I checked it twice, the data passed to the GPU is 100% the same, no difference, the only difference is the stored format on the CPU (and that's just a matter of casting).
It's not easy to answer similar general questions. Why don't you list the assembly of the two versions and compare?
Bye, bearophile