Re: Bits rotations
On 18 October 2012 03:36, bearophile bearophileh...@lycos.com wrote: In cryptographic code, and generally in bit-twiddling code, rotation of the bits in a word is a common operation. It's so common that Intel has made it an asm instruction. A demo program: uint foo(in uint x) pure nothrow { return (x 11) | (x (32 - 11)); } uint bar(uint x, in ubyte n) pure nothrow { asm { mov EAX, x; mov CL, n; rol EAX, CL; mov x, EAX; } return x; } void main() { import std.stdio; uint x = 4290772992U; writefln(%032b, x); uint y = foo(x); writefln(%032b, y); uint z = bar(x, 11); writefln(%032b, z); } Its output, dmd -O -release -inline: 1100 0110 0110 Even with full optimizations DMD seems not able to detect the rotation in foo(), and writing assembly as in bar() is not an option, because the code is even longer and bar() can't be inlined. This is the ASM generated (32 bit): _D4test3fooFNaNbxkZk: pushEAX mov ECX,[ESP] shl EAX,0Bh shr ECX,015h or EAX,ECX pop ECX ret _D4test3barFNaNbkxhZk: pushEBP mov EBP,ESP pushEAX mov EAX,8[EBP] mov CL,-4[EBP] rol EAX,CL mov 8[EBP],EAX mov EAX,8[EBP] mov ESP,EBP pop EBP ret 4 GDC 4.6.3 does better, recognizing the rol (foo() can be inlined, usually becoming 1 instruction in the middle of other code) (I like the demangling here!): pure nothrow uint example.foo(const(uint)): movl%edi, %eax roll$11, %eax ret pure nothrow uint example.bar(uint, const(ubyte)): movl %edi, -4(%rsp) movb %sil, -5(%rsp) movl -4(%rsp), %eax movb -5(%rsp), %cl roll %cl, %eax movl %eax, -4(%rsp) movl -4(%rsp), %eax ret So I'd like to write: uint spam(in uint x) pure nothrow @safe { import core.bitop: rol; return rol(x, 11); } This is better because: - It's standard. If a CPU supports rol (or ror) the compiler uses it. If it doesn't support it, the compiler uses shifts and an or. - It's shorter and more readable. For me rol(x, 11) is rather more easy to read and debug than code like (x 11) | (x (32 - 11)). - spam() is inlinable, just like foo() and unlike bar(). - It doesn't rely on compiler optimizations, that sometimes are not present. If the CPU supports the rol, the compiler doesn't need pattern matching, it just spits out a rol. DMD currently has such optimization, but apparently here it's not working, I don't know why. For such basic operation, that is built in many CPUs there is no need for compiler optimizations. Bye, bearophile In the gdc-4.6 package you have there, it's only naked asm that can't be inlined. However it is worth noting that DIASM is no longer in mainline gdc. Thanks, -- Iain Buclaw *(p e ? p++ : p) = (c 0x0f) + '0';
Re: Bits rotations
On 18 October 2012 09:27, bearophile bearophileh...@lycos.com wrote: Iain Buclaw: In the gdc-4.6 package you have there, it's only naked asm that can't be inlined. Good. However it is worth noting that DIASM is no longer in mainline gdc. What's DIASM? Is it the D syntax for asm code? If this is right, then gdc developers have done a mistake, reducing D code interoperability, creating an incompatibility where there wasn't (and reducing my desire to use gdc or to switch to it, because I have hundreds of lines of inlined asm in my D code), this means doing the opposite of what generally compiler writers are supposed to do (maybe this topic was discussed already, in past). Bye, bearophile This topic has been discussed in the past. And the current status is that GCC mainline has poisoned the frontend to use certain headers that the IASM implementation in GDC depended on. Example: int zz(int p1) { asm { naked; mov EAX, p1[EBP]; } } To calculate p1[EBP], one would have to know where p1 will land on the frame pointer to replace it with the relavant offset value. This would mean from the front-end we would have to invoke the back-end to generate and tell us the stack frame layout of zz, which is not possible because: a) Invoking this before the optimisation passes may produce a different result to what that actual result is after the optimisation passes. b) All functions are sitting in poisoned (for the front-end) headers. There is an opportunity to defer parsing IASM until the GIMPLE (middle-end) stage, however am still unable to retrieve the required information to produce the correct codegen. Regards -- Iain Buclaw *(p e ? p++ : p) = (c 0x0f) + '0';
Re: Bits rotations
On 18/10/12 11:39, Iain Buclaw wrote: On 18 October 2012 09:27, bearophile bearophileh...@lycos.com wrote: Iain Buclaw: In the gdc-4.6 package you have there, it's only naked asm that can't be inlined. Good. However it is worth noting that DIASM is no longer in mainline gdc. What's DIASM? Is it the D syntax for asm code? If this is right, then gdc developers have done a mistake, reducing D code interoperability, creating an incompatibility where there wasn't (and reducing my desire to use gdc or to switch to it, because I have hundreds of lines of inlined asm in my D code), this means doing the opposite of what generally compiler writers are supposed to do (maybe this topic was discussed already, in past). Bye, bearophile This topic has been discussed in the past. And the current status is that GCC mainline has poisoned the frontend to use certain headers that the IASM implementation in GDC depended on. Example: int zz(int p1) { asm { naked; mov EAX, p1[EBP]; } } To calculate p1[EBP], one would have to know where p1 will land on the frame pointer to replace it with the relavant offset value. This would mean from the front-end we would have to invoke the back-end to generate and tell us the stack frame layout of zz, which is not possible because: FYI: That code doesn't work in DMD either. DMD assumes a frame pointer is created in naked ASM, which is totally wrong. Code like that should not compile. The compiler does not know what the correct offsets are and should not attempt to try. a) Invoking this before the optimisation passes may produce a different result to what that actual result is after the optimisation passes. b) All functions are sitting in poisoned (for the front-end) headers. There is an opportunity to defer parsing IASM until the GIMPLE (middle-end) stage, however am still unable to retrieve the required information to produce the correct codegen. Are you just talking about naked asm? Conceptually naked asm should act as if it was created in an assembler in a seperate obj file, and accessed via extern(C). If you have problems with non-naked asm, that would make more sense to me.
Re: Bits rotations
On 18 October 2012 15:22, Don Clugston d...@nospam.com wrote: On 18/10/12 11:39, Iain Buclaw wrote: On 18 October 2012 09:27, bearophile bearophileh...@lycos.com wrote: Iain Buclaw: In the gdc-4.6 package you have there, it's only naked asm that can't be inlined. Good. However it is worth noting that DIASM is no longer in mainline gdc. What's DIASM? Is it the D syntax for asm code? If this is right, then gdc developers have done a mistake, reducing D code interoperability, creating an incompatibility where there wasn't (and reducing my desire to use gdc or to switch to it, because I have hundreds of lines of inlined asm in my D code), this means doing the opposite of what generally compiler writers are supposed to do (maybe this topic was discussed already, in past). Bye, bearophile This topic has been discussed in the past. And the current status is that GCC mainline has poisoned the frontend to use certain headers that the IASM implementation in GDC depended on. Example: int zz(int p1) { asm { naked; mov EAX, p1[EBP]; } } To calculate p1[EBP], one would have to know where p1 will land on the frame pointer to replace it with the relavant offset value. This would mean from the front-end we would have to invoke the back-end to generate and tell us the stack frame layout of zz, which is not possible because: FYI: That code doesn't work in DMD either. DMD assumes a frame pointer is created in naked ASM, which is totally wrong. Code like that should not compile. The compiler does not know what the correct offsets are and should not attempt to try. a) Invoking this before the optimisation passes may produce a different result to what that actual result is after the optimisation passes. b) All functions are sitting in poisoned (for the front-end) headers. There is an opportunity to defer parsing IASM until the GIMPLE (middle-end) stage, however am still unable to retrieve the required information to produce the correct codegen. Are you just talking about naked asm? Conceptually naked asm should act as if it was created in an assembler in a seperate obj file, and accessed via extern(C). If you have problems with non-naked asm, that would make more sense to me. Normal assembler... naked assembler has its own set of problems (requires patching in a naked style attribute which the x86 GCC maintainers rejected outrightly). -- Iain Buclaw *(p e ? p++ : p) = (c 0x0f) + '0';
Re: Bits rotations
On Thursday, 18 October 2012 at 02:36:59 UTC, bearophile wrote: uint foo(in uint x) pure nothrow { return (x 11) | (x (32 - 11)); } Why would you write that instead of using rol(x, 11) today? uint rol(in uint x, in uint y) pure nothrow { return (x y) | (x (32 - y)); } uint foo(in uint x) pure nothrow { return rol(x, 11); } the rest is the same. Compile it and see a rol instruction, inlined, in the main function. Though there is a bit of extra spam around it, some movs that don't seem necessary to me. Here's the top function so you can see the movs: _D5test43rolFNaNbxkxkZk: 0: 55 push ebp 1: 8b ec movebp,esp 3: 50 push eax 4: 8b 45 08moveax,DWORD PTR [ebp+0x8] 7: 8b 4d fcmovecx,DWORD PTR [ebp-0x4] a: 8b e5 movesp,ebp c: d3 c0 roleax,cl e: 5d popebp f: c2 04 00ret0x4 12: 90 nop 13: 90 nop Perhaps we should add that rol function to the stdlib to save people from quickly doing it themselves, but there's no need to do anything beyond this simple function.
Re: Bits rotations
On Thu, Oct 18, 2012 at 04:55:38AM +0200, Adam D. Ruppe wrote: On Thursday, 18 October 2012 at 02:36:59 UTC, bearophile wrote: uint foo(in uint x) pure nothrow { return (x 11) | (x (32 - 11)); } Why would you write that instead of using rol(x, 11) today? uint rol(in uint x, in uint y) pure nothrow { return (x y) | (x (32 - y)); } [...] Perhaps we should add that rol function to the stdlib to save people from quickly doing it themselves, but there's no need to do anything beyond this simple function. +1, add this to core.bitop. T -- If you compete with slaves, you become a slave. -- Norbert Wiener