Re: [LLVMdev] Handling of pointer difference in llvm-gcc and clang

Richard Guenther Thu, 11 Aug 2011 10:11:56 -0700

On Thu, Aug 11, 2011 at 6:05 PM, Florian Merz <[email protected]> wrote:
> Thanks for your reply Richard, but I'm not satisfied with your answer, yet. 
> :-)
> If I'm right, then the problem I'm refering to doesn't require large objects.
>
> See below for more.
>
> Am Thursday, 11. August 2011, 17:48:26 schrieb Richard Guenther:
>> On Thu, Aug 11, 2011 at 5:15 PM, Florian Merz <[email protected]> wrote:
>> > Dear gcc developers,
>> >
>> > this is about an issue that popped up in a verification project [1] based
>> > on LLVM, but it seems to be already present in the gimple code, before
>> > llvm-gcc transforms the gimple code to LLVM-IR.
>> >
>> > In short:
>> > Calculating the difference of two pointers seems to be treated by gcc as
>> > a signed integer subtraction. While the result should be of type
>> > ptrdiff_t and therefore signed, we believe the subtraction itself should
>> > not be signed.
>> >
>> > Signed subtraction might overflow if a large positive number is
>> > subtracted from a large negative number. So subtracting for example from
>> > the pointer value 0x80...0 (a large negative signed integer) the pointer
>> > value 0x7F...F (a large positive signed integer) should in theory be
>> > perfectly fine, but trating this as a signed subtraction causes an
>> > overflow and therefore undefined behaviour.
>> >
>> > Can someone explain why this is treated as a signed subtraction?
>>
>> GCC restricts objects to the size of half of the address-space thus
>> a valid pointer subtraction in C cannot overflow.
>
> Consider an array containing 8 bytes starting at 0x7FFFFFFC. This array would
> go up to one less than 0x80000004.
>
> If I remember the standard correctly, pointer subtraction is valid if both
> pointers point to elements of the same array or to one past the last element
> of the array. According to this 0x80000000 - 0x7FFFFFFF should be a valid
> pointer subtraction with the result 0x00000001.
>
> But if the subtraction is treated as a signed, this would be an signed integer
> overflow, as we subtract INT_MAX from INT_MIN, which surely must overflow, and
> the result therefore would be undefined.


int x,y;
int main ()
{
  char *a, *b;
  __INTPTR_TYPE__ w;
  if (x)
    a = 0x7ffffffe;
  else
    a = 0x7fffffff;
  if (y)
    b = 0x80000001;
  else
    b = 0x80000000;
  w = b - a;
  return w;
}

indeed traps with -ftrapv for me which suggests you are right.

Joseph?

Richard.

>> Richard.
>>
>> > Thanks a lot and regards,
>> >  Florian
>> >
>> > P.S: It seems like clang does not treat this subtraction as signed.
>> >
>> > [1] http://baldur.iti.kit.edu/llbmc/
>> >
>> > ----------  Weitergeleitete Nachricht  ----------
>> >
>> > Betreff: Re: [LLVMdev] Handling of pointer difference in llvm-gcc and
>> > clang Datum: Wednesday, 10. August 2011, 19:12:43
>> > Von: Jack Howarth <[email protected]>
>> > An: Duncan Sands <[email protected]>
>> > Kopie: [email protected]
>> >
>> > On Wed, Aug 10, 2011 at 06:13:16PM +0200, Duncan Sands wrote:
>> >> Hi Stephan,
>> >>
>> >> > We are developing a bounded model checker for C/C++ programs
>> >> > (http://baldur.iti.kit.edu/llbmc/) that operates on LLVM's
>> >> > intermediate representation.  While checking a C++ program that uses
>> >> > STL containers we noticed that llvm-gcc and clang handle pointer
>> >> > differences in disagreeing ways.
>> >> >
>> >> > Consider the following C function:
>> >> > int f(int *p, int *q)
>> >> > {
>> >> >       return q - p;
>> >> > }
>> >> >
>> >> > Here's the LLVM code generated by llvm-gcc (2.9):
>> >> > define i32 @f(i32* %p, i32* %q) nounwind readnone {
>> >> > entry:
>> >> >     %0 = ptrtoint i32* %q to i32
>> >> >     %1 = ptrtoint i32* %p to i32
>> >> >     %2 = sub nsw i32 %0, %1
>> >> >     %3 = ashr exact i32 %2, 2
>> >> >     ret i32 %3
>> >> > }
>> >> >
>> >> > And here is what clang (2.9) produces:
>> >> > define i32 @f(i32* %p, i32* %q) nounwind readnone {
>> >> >     %1 = ptrtoint i32* %q to i32
>> >> >     %2 = ptrtoint i32* %p to i32
>> >> >     %3 = sub i32 %1, %2
>> >> >     %4 = ashr exact i32 %3, 2
>> >> >     ret i32 %4
>> >> > }
>> >> >
>> >> > Thus, llvm-gcc added the nsw flag to the sub, whereas clang didn't.
>> >> >
>> >> > We think that clang is right and llvm-gcc is wrong:  it could be the
>> >> > case that p and q point into the same array, that q is 0x80000000, and
>> >> > that p is 0x7FFFFFFE.  Then the sub results in a signed overflow,
>> >> > i.e., sub with nsw is a trap value.
>> >> >
>> >> > Is this a bug in llvm-gcc?
>> >>
>> >> in llvm-gcc (and dragonegg) this is coming directly from GCC's gimple:
>> >>
>> >> f (int * p, int * q)
>> >> {
>> >>    long int D.2718;
>> >>    long int D.2717;
>> >>    long int p.1;
>> >>    long int q.0;
>> >>    int D.2714;
>> >>
>> >> <bb 2>:
>> >>    q.0_2 = (long int) q_1(D);
>> >>    p.1_4 = (long int) p_3(D);
>> >>    D.2717_5 = q.0_2 - p.1_4;
>> >>    D.2718_6 = D.2717_5 /[ex] 4;
>> >>    D.2714_7 = (int) D.2718_6;
>> >>    return D.2714_7;
>> >>
>> >> }
>> >>
>> >> Signed overflow in the difference of two long int (ptrdiff_t) values
>> >> results in undefined behaviour according to the GCC type system, which
>> >> is where the nsw flag comes from.
>> >>
>> >> The C front-end generates this gimple in the pointer_diff routine.  The
>> >> above
>> >
>> > is
>> >
>> >> basically a direct transcription of what pointer_diff does.
>> >>
>> >> In short, I don't know if this is right or wrong; but if it is wrong it
>> >
>> > seems
>> >
>> >> to be a bug in GCC's C frontend.
>> >
>> > Shouldn't we cc this over to the gcc mailing list for clarification then?
>> >             Jack
>> >
>> >> Ciao, Duncan.
>

Re: [LLVMdev] Handling of pointer difference in llvm-gcc and clang

Reply via email to