On 11 January 2014 06:20, Daniel Micay <danielmi...@gmail.com> wrote:

> The branch on the overflow flag results in a very significant loss in
> performance. For example, I had to carefully write the vector `push`
> method for my `Vec<T>` type to only perform one overflow check. With
> two checks, it's over 5 times slower due to failed branch predictions.
>

What did the generated code look like? I suspect that LLVM wasn't
generating optimal code, perhaps because Rust wasn't giving it appropriate
hints or because of optimizer bugs. For reference, on AMD64 the code should
look something like the following hypothetical code:

vec_allocate:
MOV $SIZE, %eax
MUL %rsi
JC Lerror
ADD $HEADER_SIZE, %rax
JC Lerror
MOV %rax, %rsi
JMP malloc
Lerror:
// Code to raise error here

Note that the ordering is EXTREMELY important! x86 doesn't give you any
separate branch hints (excluding two obsolete ones which only the Pentium
IV ever cared about) so your only clue to the optimizer is the branch
direction.

I suspect your generated code had forward branches for the no overflow
case. Thats absolutely no good (codegen inerting "islands" of failure case
code); it will screw up the branch predictor.

x86 defaults to predicting all (conditional) forward jumps not taken, all
conditional backwards jumps taken (Loops!). If the optimizer wasn't
informed correctly, it will probably not have obeyed that.

Being as the overflow case should basically be never hit, there is no
reason for it to ever be loaded into the optimizer, so that is good

(P.S. If the rust compiler is really good it'll convince LLVM to put the
error case branch code in a separate section so it can all be packed
together far away from useful cache lines and TLB entries)
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to