hfinkel added a comment.

In https://reviews.llvm.org/D32199#732737, @rsmith wrote:

> In https://reviews.llvm.org/D32199#732189, @hfinkel wrote:
>
> > In https://reviews.llvm.org/D32199#731472, @rsmith wrote:
> >
> > > 1. C's "effective type" rule allows writes to set the type pretty much 
> > > unconditionally, unless the storage is for a variable with a declared type
> >
> >
> > To come back to this point: We don't really implement these rules now, and 
> > it is not clear that we will. The problem here is that, if we take the 
> > specification literally, then we can't use our current TBAA at all. The 
> > problem is that if we have:
> >
> >   write x, !tbaa "int"
> >   read x, !tbaa "int"
> >   write x, !tbaa "float"
> >   
> >
> > TBAA will currently tell us that the "float" write aliases with neither the 
> > preceding read nor the preceding write.
>
>
> Right, C's TBAA rules do not (in general) permit a store to be reordered 
> before a memory operation of a different type, they only allow loads to be 
> moved before stores. (Put another way, they do not tell you that pointers 
> point to distinct memory locations, just that a stored value cannot be 
> observed by a load of a different type.) You get the more general "distinct 
> memory locations" result only for objects of a declared type.
>
> C++ is similar, except that (because object lifetimes do not currently begin 
> magically due to a store) you /can/ reorder stores past a memory operation of 
> a different type if you know no object's lifetime began in between. (But 
> currently we do not record all lifetime events in IR, so we can't do that 
> today. Also, we may be about to lose the property that you can statically 
> determine a small number of places that might start an object lifetime.)
>
> > Also, a strict reading of C's access rules seems to rule out the premise 
> > underlying our struct-path TBAA entirely. So long as I'm accessing a value 
> > using a struct that has some member, including recursively, with that type, 
> > then it's fine. The matching of the relative offsets is a sufficient, but 
> > not necessary, condition for well-defined access. C++ has essentially the 
> > same language (and, thus, potentially the same problem).
>
> I agree this rule is garbage, but it's not as permissive as I think you're 
> suggesting. The rule says that you can use an lvalue of struct type to access 
> memory of struct field type. In C this happens during struct assignment, for 
> instance. It does *not* permit using an lvalue of struct field type to access 
> unrelated fields of the same struct. So C appears to allow this nonsense:
>
>   char *p = malloc(8);
>   *(int*)p = 0;
>   *(int*)(p + 4) = 0;
>   struct S {int n; float f;} s = *(struct S*)p; // use lvalue of type `struct 
> S` to access object of effective type `int`, to initialize a `float`
>
>
> but not this nonsense:
>
>   float q = ((struct S*)p)->f; // ub, cannot use lvalue of type `float` to 
> access object of effective type `int`
>
>
> ... which just means that we can't make much use of TBAA when emitting struct 
> copies in C.
>
> In C++, on the other hand, the rule is even more garbage, since there is no 
> way to perform a memory access with a glvalue of class type. (The closest you 
> get is that a defaulted union construction/assignment copies the object 
> representation, but that's expressed in terms of copying a sequence of 
> unsigned chars, and in any case those are member functions and so already 
> require an object of the correct type to exist.) See wg21.link/cwg2051


Our struct-path TBAA does the following:

  struct X { int a, b; };
  X x { 50, 100 };
  X *o = (X*) (((int*) &x) + 1);
  
  int a_is_b = o->a; // This is UB (or so we say)?


Because we assume that the (type, offset) tuples are identified entities in the 
type-aliasing tree. Practically speaking, this certainly makes sense to me. 
However, I don't see anything in the language that actually forbids this 
behavior. In case it matters, because in the above case the type of the struct 
actually matches, we similarly forbid:

  struct X { int a, b; };
  struct Y { int a; float b; };
  X x { 50, 100 };
  Y *o = (X*) (((int*) &x) + 1);
  
  int a_is_b = o->a; // This is UB (or so we say)?

as is this:

  struct X { int a, b; };
  struct Y { int a; float b; X h; /* in case this matters for the aggregate 
members thing */ };
  X x { 50, 100 };
  Y *o = (X*) (((int*) &x) + 1);
  
  int a_is_b = o->a; // This is UB (or so we say)?

(although, as you say, this shouldn't matter in C++ because we don't have 
struct glvalues)

In any case, am I missing something?

> 
> 
>> While I'd like the sanitizer to follow the rules, that's really useful only 
>> to the extent that the compilers follow the rules. If the compilers are 
>> making stronger assumptions, then I think that the sanitizer should also.
> 
> I agree.
> 
>>> If we want to follow the relevant language rules by default, that would 
>>> suggest that "writes always set the type" should be enabled by default in C 
>>> and disabled by default in C++. That may not be the right decision for 
>>> other reasons, though. In C++, writes through union members and 
>>> new-expressions should probably (re)set the type (do you have intrinsics 
>>> the frontend can use to do so?).
>> 
>> Also, in C, memcpy gets to copy the type for a destination that does not 
>> have declared types. It looks like the same is true for C++ for 
>> trivially-copyable types. Is the first read/write sets the unknown type 
>> (i.e. memory from malloc/calloc/memset, etc.) correct for C++ also?
> 
> As I recall, "store can create an object" is the broad direction that SG12 
> agreed on for the cases where you have a pointer into a raw storage buffer 
> (that is, a char array), and we want the low-level storage allocation 
> functions to give us such a buffer.

What about a read after a calloc (or memset)?


https://reviews.llvm.org/D32199



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to