On 11 January 2014 21:42, Daniel Micay <danielmi...@gmail.com> wrote:

> On Sat, Jan 11, 2014 at 4:31 PM, Owen Shepherd <owen.sheph...@e43.eu>
> wrote:
> > So I just did a test. Took the following rust code:
> > pub fn test_wrap(x : u32, y : u32) -> u32 {
> >     return x.checked_mul(&y).unwrap().checked_add(&16).unwrap();
> > }
> >
> > And got the following blob of assembly out. What we have there, my
> friends,
> > is a complete failure of the optimizer (N.B. it works for the simple
> case of
> > checked_add alone)
> >
> > Preamble:
> >
> > __ZN9test_wrap19hc4c136f599917215af4v0.0E:
> >     .cfi_startproc
> >     cmpl    %fs:20, %esp
> >     ja    LBB0_2
> >     pushl    $12
> >     pushl    $20
> >     calll    ___morestack
> >     ret
> > LBB0_2:
> >     pushl    %ebp
> > Ltmp2:
> >     .cfi_def_cfa_offset 8
> > Ltmp3:
> >     .cfi_offset %ebp, -8
> >     movl    %esp, %ebp
> > Ltmp4:
> >     .cfi_def_cfa_register %ebp
> >
> > Align stack (for what? We don't do any SSE)
> >
> >     andl    $-8, %esp
> >     subl    $16, %esp
>
> The compiler aligns the stack for performance.
>
>

Oops, I misread and thought there was 16 byte alignment going on there, not
8.


> > Multiply x * y
> >
> >     movl    12(%ebp), %eax
> >     mull    16(%ebp)
> >     jno    LBB0_4
> >
> > If it didn't overflow, stash a 0 at top of stack
> >
> >     movb    $0, (%esp)
> >     jmp    LBB0_5
> >
> > If it did overflow, stash a 1 at top of stack (we are building an
> > Option<u32> here)
> > LBB0_4:
> >     movb    $1, (%esp)
> >     movl    %eax, 4(%esp)
> >
> > Take pointer to &this for __thiscall:
> > LBB0_5:
> >     leal    (%esp), %ecx
> >     calll    __ZN6option6Option6unwrap21h05c5cb6c47a61795Zcat4v0.0E
> >
> > Do the addition to the result
> >
> >     addl    $16, %eax
> >
> > Repeat the previous circus
> >
> >     jae    LBB0_7
> >     movb    $0, 8(%esp)
> >     jmp    LBB0_8
> > LBB0_7:
> >     movb    $1, 8(%esp)
> >     movl    %eax, 12(%esp)
> > LBB0_8:
> >     leal    8(%esp), %ecx
> >     calll    __ZN6option6Option6unwrap21h05c5cb6c47a61795Zcat4v0.0E
> >     movl    %ebp, %esp
> >     popl    %ebp
> >     ret
> >     .cfi_endproc
> >
> >
> > Yeah. Its' not fast because its' not inlining through option::unwrap.
>
> The code to initiate failure is gigantic and LLVM doesn't do partial
> inlining by default. It's likely far above the inlining threshold.
>
>
Right, why I suggested explicitly moving the failure code out of line into
a separate function.


> A purely synthetic benchmark only executing the unchecked or checked
> instruction isn't interesting. You need to include several
> optimizations in the loop as real code would use, and you will often
> see a massive drop in performance from the serialization of the
> pipeline. Register renaming is not as clever as you'd expect.
>
>
Agreed. The variability within that tiny benchmark tells me that it can't
really glean any valuable information.


> The impact of trapping is known, because `clang` and `gcc` expose
> `-ftrapv`.
>  Integer-heavy workloads like cryptography and video codecs are
> several times slower with the checks.
>

What about other workloads?

As I mentioned: What I'd propose is trapping by default, with non-trapping
math along the lines of a single additonal character on a type declaration
away.

Also, I did manage to convince Rust + LLVM to optimize things cleanly, by
defining an unwrap which invoked libc's abort() -> !, so there's that.
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to