Heinz: > void myFunct() > { > uint* p = myarray.ptr; > asm > { > mov EBX, p; > > mov EAX, [EBX + 4]; > rol EAX, 8; > mov [EBX + 4], EAX; > > mov EAX, [EBX + 8]; > rol EAX, 16; > mov [EBX + 8], EAX; > > mov EAX, [EBX + 12]; > rol EAX, 24; > mov [EBX + 12], EAX; > } > }
I see you have removed the asm guard I have shown you. I suggest you to benchmark it against another normal D function. Keep in mind that asm blocks kill inlining. Also try to perform a load-load-load processing-processing-processing store-store-store instead a load-processing-store load-processing-store load-processing-store, because this often helps the pipelining of the processor (expecially when you use SSE/AVX registers). Bye, bearophile