On 18 October 2012 03:36, bearophile <bearophileh...@lycos.com> wrote: > In cryptographic code, and generally in bit-twiddling code, rotation of the > bits in a word is a common operation. It's so common that Intel has made it > an asm instruction. > > A demo program: > > > uint foo(in uint x) pure nothrow { > return (x << 11) | (x >> (32 - 11)); > } > > uint bar(uint x, in ubyte n) pure nothrow { > asm { > mov EAX, x; > mov CL, n; > rol EAX, CL; > mov x, EAX; > } > return x; > } > > void main() { > import std.stdio; > uint x = 4290772992U; > writefln("%032b", x); > uint y = foo(x); > writefln("%032b", y); > uint z = bar(x, 11); > writefln("%032b", z); > } > > > Its output, dmd -O -release -inline: > > 11111111110000000000000000000000 > 00000000000000000000011111111110 > 00000000000000000000011111111110 > > > Even with full optimizations DMD seems not able to detect the rotation in > foo(), and writing assembly as in bar() is not an option, because the code > is even longer and bar() can't be inlined. This is the ASM generated (32 > bit): > > > _D4test3fooFNaNbxkZk: > push EAX > mov ECX,[ESP] > shl EAX,0Bh > shr ECX,015h > or EAX,ECX > pop ECX > ret > > _D4test3barFNaNbkxhZk: > push EBP > mov EBP,ESP > push EAX > mov EAX,8[EBP] > mov CL,-4[EBP] > rol EAX,CL > mov 8[EBP],EAX > mov EAX,8[EBP] > mov ESP,EBP > pop EBP > ret 4 > > > GDC 4.6.3 does better, recognizing the rol (foo() can be inlined, usually > becoming 1 instruction in the middle of other code) (I like the demangling > here!): > > pure nothrow uint example.foo(const(uint)): > movl %edi, %eax > roll $11, %eax > ret > > pure nothrow uint example.bar(uint, const(ubyte)): > movl %edi, -4(%rsp) > movb %sil, -5(%rsp) > movl -4(%rsp), %eax > movb -5(%rsp), %cl > roll %cl, %eax > movl %eax, -4(%rsp) > movl -4(%rsp), %eax > ret > > > So I'd like to write: > > uint spam(in uint x) pure nothrow @safe { > import core.bitop: rol; > return rol(x, 11); > } > > > This is better because: > - It's standard. If a CPU supports rol (or ror) the compiler uses it. If it > doesn't support it, the compiler uses shifts and an or. > - It's shorter and more readable. For me "rol(x, 11)" is rather more easy to > read and debug than code like "(x << 11) | (x >> (32 - 11))". > - spam() is inlinable, just like foo() and unlike bar(). > - It doesn't rely on compiler optimizations, that sometimes are not present. > If the CPU supports the rol, the compiler doesn't need pattern matching, it > just spits out a rol. DMD currently has such optimization, but apparently > here it's not working, I don't know why. For such basic operation, that is > built in many CPUs there is no need for compiler optimizations. > > Bye, > bearophile
In the gdc-4.6 package you have there, it's only naked asm that can't be inlined. However it is worth noting that DIASM is no longer in mainline gdc. Thanks, -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';