rsmith added a comment.

In https://reviews.llvm.org/D32199#732189, @hfinkel wrote:

> In https://reviews.llvm.org/D32199#731472, @rsmith wrote:
>
> > 1. C's "effective type" rule allows writes to set the type pretty much 
> > unconditionally, unless the storage is for a variable with a declared type
>
>
> To come back to this point: We don't really implement these rules now, and it 
> is not clear that we will. The problem here is that, if we take the 
> specification literally, then we can't use our current TBAA at all. The 
> problem is that if we have:
>
>   write x, !tbaa "int"
>   read x, !tbaa "int"
>   write x, !tbaa "float"
>   
>
> TBAA will currently tell us that the "float" write aliases with neither the 
> preceding read nor the preceding write.


Right, C's TBAA rules do not (in general) permit a store to be reordered before 
a memory operation of a different type, they only allow loads to be moved 
before stores. (Put another way, they do not tell you that pointers point to 
distinct memory locations, just that a stored value cannot be observed by a 
load of a different type.) You get the more general "distinct memory locations" 
result only for objects of a declared type.

C++ is similar, except that (because object lifetimes do not currently begin 
magically due to a store) you /can/ reorder stores past a memory operation of a 
different type if you know no object's lifetime began in between. (But 
currently we do not record all lifetime events in IR, so we can't do that 
today. Also, we may be about to lose the property that you can statically 
determine a small number of places that might start an object lifetime.)

> Also, a strict reading of C's access rules seems to rule out the premise 
> underlying our struct-path TBAA entirely. So long as I'm accessing a value 
> using a struct that has some member, including recursively, with that type, 
> then it's fine. The matching of the relative offsets is a sufficient, but not 
> necessary, condition for well-defined access. C++ has essentially the same 
> language (and, thus, potentially the same problem).

I agree this rule is garbage, but it's not as permissive as I think you're 
suggesting. The rule says that you can use an lvalue of struct type to access 
memory of struct field type. In C this happens during struct assignment, for 
instance. It does *not* permit using an lvalue of struct field type to access 
unrelated fields of the same struct. So C appears to allow this nonsense:

  char *p = malloc(8);
  *(int*)p = 0;
  *(int*)(p + 4) = 0;
  struct S {int n; float f;} s = *(struct S*)p; // use lvalue of type `struct 
S` to access object of effective type `int`, to initialize a `float`

but not this nonsense:

  float q = ((struct S*)p)->f; // ub, cannot use lvalue of type `float` to 
access object of effective type `int`

... which just means that we can't make much use of TBAA when emitting struct 
copies in C.

In C++, on the other hand, the rule is even more garbage, since there is no way 
to perform a memory access with a glvalue of class type. (The closest you get 
is that a defaulted union construction/assignment copies the object 
representation, but that's expressed in terms of copying a sequence of unsigned 
chars, and in any case those are member functions and so already require an 
object of the correct type to exist.) See wg21.link/cwg2051

> While I'd like the sanitizer to follow the rules, that's really useful only 
> to the extent that the compilers follow the rules. If the compilers are 
> making stronger assumptions, then I think that the sanitizer should also.

I agree.

>> If we want to follow the relevant language rules by default, that would 
>> suggest that "writes always set the type" should be enabled by default in C 
>> and disabled by default in C++. That may not be the right decision for other 
>> reasons, though. In C++, writes through union members and new-expressions 
>> should probably (re)set the type (do you have intrinsics the frontend can 
>> use to do so?).
> 
> Also, in C, memcpy gets to copy the type for a destination that does not have 
> declared types. It looks like the same is true for C++ for trivially-copyable 
> types. Is the first read/write sets the unknown type (i.e. memory from 
> malloc/calloc/memset, etc.) correct for C++ also?

As I recall, "store can create an object" is the broad direction that SG12 
agreed on for the cases where you have a pointer into a raw storage buffer 
(that is, a char array), and we want the low-level storage allocation functions 
to give us such a buffer.


https://reviews.llvm.org/D32199



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to