On 11 January 2014 21:42, Daniel Micay <danielmi...@gmail.com> wrote:
> On Sat, Jan 11, 2014 at 4:31 PM, Owen Shepherd <owen.sheph...@e43.eu> > wrote: > > So I just did a test. Took the following rust code: > > pub fn test_wrap(x : u32, y : u32) -> u32 { > > return x.checked_mul(&y).unwrap().checked_add(&16).unwrap(); > > } > > > > And got the following blob of assembly out. What we have there, my > friends, > > is a complete failure of the optimizer (N.B. it works for the simple > case of > > checked_add alone) > > > > Preamble: > > > > __ZN9test_wrap19hc4c136f599917215af4v0.0E: > > .cfi_startproc > > cmpl %fs:20, %esp > > ja LBB0_2 > > pushl $12 > > pushl $20 > > calll ___morestack > > ret > > LBB0_2: > > pushl %ebp > > Ltmp2: > > .cfi_def_cfa_offset 8 > > Ltmp3: > > .cfi_offset %ebp, -8 > > movl %esp, %ebp > > Ltmp4: > > .cfi_def_cfa_register %ebp > > > > Align stack (for what? We don't do any SSE) > > > > andl $-8, %esp > > subl $16, %esp > > The compiler aligns the stack for performance. > > Oops, I misread and thought there was 16 byte alignment going on there, not 8. > > Multiply x * y > > > > movl 12(%ebp), %eax > > mull 16(%ebp) > > jno LBB0_4 > > > > If it didn't overflow, stash a 0 at top of stack > > > > movb $0, (%esp) > > jmp LBB0_5 > > > > If it did overflow, stash a 1 at top of stack (we are building an > > Option<u32> here) > > LBB0_4: > > movb $1, (%esp) > > movl %eax, 4(%esp) > > > > Take pointer to &this for __thiscall: > > LBB0_5: > > leal (%esp), %ecx > > calll __ZN6option6Option6unwrap21h05c5cb6c47a61795Zcat4v0.0E > > > > Do the addition to the result > > > > addl $16, %eax > > > > Repeat the previous circus > > > > jae LBB0_7 > > movb $0, 8(%esp) > > jmp LBB0_8 > > LBB0_7: > > movb $1, 8(%esp) > > movl %eax, 12(%esp) > > LBB0_8: > > leal 8(%esp), %ecx > > calll __ZN6option6Option6unwrap21h05c5cb6c47a61795Zcat4v0.0E > > movl %ebp, %esp > > popl %ebp > > ret > > .cfi_endproc > > > > > > Yeah. Its' not fast because its' not inlining through option::unwrap. > > The code to initiate failure is gigantic and LLVM doesn't do partial > inlining by default. It's likely far above the inlining threshold. > > Right, why I suggested explicitly moving the failure code out of line into a separate function. > A purely synthetic benchmark only executing the unchecked or checked > instruction isn't interesting. You need to include several > optimizations in the loop as real code would use, and you will often > see a massive drop in performance from the serialization of the > pipeline. Register renaming is not as clever as you'd expect. > > Agreed. The variability within that tiny benchmark tells me that it can't really glean any valuable information. > The impact of trapping is known, because `clang` and `gcc` expose > `-ftrapv`. > Integer-heavy workloads like cryptography and video codecs are > several times slower with the checks. > What about other workloads? As I mentioned: What I'd propose is trapping by default, with non-trapping math along the lines of a single additonal character on a type declaration away. Also, I did manage to convince Rust + LLVM to optimize things cleanly, by defining an unwrap which invoked libc's abort() -> !, so there's that.
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev