https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #1)
> Is that actually faster though?  The original has shorter dependency
> chains.  Or is this to avoid some LHS/SHL?

Yes, I tested it with one constructed case, the original version takes 18.20s
while the optimized version takes 8.40s. And yes, I guess it's due to LHS/SHL
similar to the vec_insert issue xionghu is working on.

Reply via email to