On Nov 9, 2013, at 02:48, Ondřej Bílka <nel...@seznam.cz> wrote:
>> I've done the overflow checking in Gigi (Ada front end). Benchmarking
>> real world large Ada programs (where every integer operation is checked,
>> including array index computations etc.), I found the performance cost 
>> *very* small (less than 1% on typical code, never more than 2%). There
>> is a bit more cost in code size, but that is mostly due to the fact that
>> we want to generate error messages with correct line number information
>> without relying on backtraces.
>> 
> Overhead is mostly from additonal branches that are not taken. We need
> more accurate measure of cache effects than code size, for example
> looking to increased number icache hits which will not count code that
> is never executed.

Indeed, and execution time testing shows there isn't a significant
increase in icache pressure.

>> [...]
>> A few things helped to make the cost small: the biggest one is that
>> typically on of the operands is known to be negative or positive.
>> Gigi will use Ada type information, and Natural or Positive integer
>> variables are very common.  So, if you compute X + C with C positive, 
>> you can write the conditional expression as:
> 
> On x64 efect of this analysis is small, processor does overflow detection for 
> you.

The problem as always is that of pass ordering. If we would encapsulate
the overflow check some kind of builtin that we'd directly very late in
the compilation process to processor-specific instructions, then early 
optimizers cannot do their work.

> By just expanding out to regular additions, comparisons and control
> flow we can avoid this problem.
> 
>> (if X < Integer'Last - C then X + C else raise Constraint_Error)
>> 
> 
>> On my x86-64 this generates something like:
>> 00   cmpl    $0x7fffffff,%de
>> 06   je      0x0000000c
>> 08   leal    0x01(%rdi),%eax
>> 0b   ret
>> [..]
> 
> This has redundant compare instruction that cost a cycle and 6 bytes.
> You can just write
> 
>   0:  83 c7 01                add    $0x1,%edi
>   3:  71 03                   jno    0x8
> [...]
> When you know that one operand is positive or you deal with unsigned
> then you could replace jno with jnc which is bit faster on sandy bridge
> processors and later as add, jnc pair is macro-fused but add jno is not.

Indeed that is ideal, assuming condition flags are dead, but I think
that the right way to do that is by late combine-like optimizations
after instruction selection.

In looking at generated code in actual programs, most checks are
optimized away. This is more important than saving a cycle here or
there in the much smaller number of checks that remain. After all,
we all "know" that our programs will never fail any overflow checks,
so it is just a matter of the compiler to be smart enough to prove
this. While it won't achieve that goal (halting problem etc), for
a large fraction localized analysis is sufficient to prove checks
cannot fail. I'm afraid that premature optimization will always be
a loss here.

Probably combine, in combination with some machine-specific instruction
patterns, could be taught to do some of the optimizations you
mention, and that would have as advantage that they'd be also
applicable to manual tests people write.

  -Geert

Reply via email to