modern x86-64 processors can execute them in parallel. Because of that, the speed of your program is limited by instruction latency and not throughput.

It seems like auto-vectorization to SIMD code may be an ideal strategy (e.g. Java) since it seems that the conditions to get any performance improvement have to be very particular and situational... which is something the compiler may be best suited to handle. Thoughts?

Reply via email to