What is the status at the moment? What compiler and with which compiler flags I should use to achieve maximum performance?
In general gdc or ldc. Not sure how good vectorization is though, esp. auto-vectorization. On the other hand the so called vector operations like a[] = b[] + c[]; are lowered to hand-written SSE assembly even in dmd.