Heinz:

> void myFunct()
> {
>     uint* p = myarray.ptr;
>     asm
>     {
>         mov EBX, p;
> 
>         mov EAX, [EBX + 4];
>         rol EAX, 8;
>         mov [EBX + 4], EAX;
> 
>         mov EAX, [EBX + 8];
>         rol EAX, 16;
>         mov [EBX + 8], EAX;
> 
>         mov EAX, [EBX + 12];
>         rol EAX, 24;
>         mov [EBX + 12], EAX;
>     }
> }

I see you have removed the asm guard I have shown you.
I suggest you to benchmark it against another normal D function. Keep in mind 
that asm blocks kill inlining.
Also try to perform a load-load-load processing-processing-processing 
store-store-store instead a load-processing-store load-processing-store 
load-processing-store, because this often helps the pipelining of the processor 
(expecially when you use SSE/AVX registers).


Bye,
bearophile

Reply via email to