The "llvm.mem.parallel_loop_access" is an annotation on loads and stores
that indicate they do not depend on other iterations.  @simd causes them to
be sprinkled throughout the loop when the LLVM IR is generated.  The lack
of "load <*n* x float>" indicates that the LLVM vectorizer gave up.  I'm
not sure what spooked it.  The code looks like it should have vectorized.
 I'll investigate.

Reply via email to