On 11 January 2014 06:20, Daniel Micay <danielmi...@gmail.com> wrote:
> The branch on the overflow flag results in a very significant loss in > performance. For example, I had to carefully write the vector `push` > method for my `Vec<T>` type to only perform one overflow check. With > two checks, it's over 5 times slower due to failed branch predictions. > What did the generated code look like? I suspect that LLVM wasn't generating optimal code, perhaps because Rust wasn't giving it appropriate hints or because of optimizer bugs. For reference, on AMD64 the code should look something like the following hypothetical code: vec_allocate: MOV $SIZE, %eax MUL %rsi JC Lerror ADD $HEADER_SIZE, %rax JC Lerror MOV %rax, %rsi JMP malloc Lerror: // Code to raise error here Note that the ordering is EXTREMELY important! x86 doesn't give you any separate branch hints (excluding two obsolete ones which only the Pentium IV ever cared about) so your only clue to the optimizer is the branch direction. I suspect your generated code had forward branches for the no overflow case. Thats absolutely no good (codegen inerting "islands" of failure case code); it will screw up the branch predictor. x86 defaults to predicting all (conditional) forward jumps not taken, all conditional backwards jumps taken (Loops!). If the optimizer wasn't informed correctly, it will probably not have obeyed that. Being as the overflow case should basically be never hit, there is no reason for it to ever be loaded into the optimizer, so that is good (P.S. If the rust compiler is really good it'll convince LLVM to put the error case branch code in a separate section so it can all be packed together far away from useful cache lines and TLB entries)
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev