http://d.puremagic.com/issues/show_bug.cgi?id=9963
Summary: Absurdly Inefficient Codegen For Adding Boolean Predicates Product: D Version: D1 & D2 Platform: All OS/Version: All Status: NEW Keywords: performance Severity: normal Priority: P2 Component: DMD AssignedTo: nob...@puremagic.com ReportedBy: dsim...@yahoo.com --- Comment #0 from David Simcha <dsim...@yahoo.com> 2013-04-19 12:25:32 PDT --- D source Code: __gshared ulong n_less = 0, n_greater = 0; void doConditional(ubyte thresh, ubyte[] arr) { ulong l, g; foreach(val; arr) { l += (thresh < val); g += !(thresh < val); } n_less += l; n_greater += g; } DMD-generated ASM code (foreach loop only, from obj2asm, when compiled with -O -inline -release): L33: mov RDX,-018h[RBP] mov CL,[RDX][R8] cmp CL,R9B mov EAX,1 ja L47 xor EAX,EAX L47: cdqe add R11,RAX cmp R9B,CL sbb EAX,EAX inc EAX cdqe add RBX,RAX inc R8 cmp R8,-010h[RBP] jb L33 Why use sbb + neg + two cmp instructions instead of just using setb and setae? This executes in about 0.495 seconds for an array of 100 million elements. GCC's codegen for the same function: L20: movzx ECX,[RAX][RDX] xor R10D,R10D cmp ECX,EDI setnle R10B add R9,R10 cmp ECX,EDI setle CL add RAX,1 movzx ECX,CL add R8,RCX cmp RAX,RSI jne L20 This executes in about 0.095 seconds for an array of 100 million elements. My hand-compilation for this loop: LStart: cmp DL, byte ptr [RAX]; setae R9B; adc R10, 0; inc RAX; add R11, R9; cmp RAX, RBX; jb LStart; This executes in about 0.071 seconds for an array of 100 million elements. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------