hfinkel added a comment. In https://reviews.llvm.org/D32199#732737, @rsmith wrote:
> In https://reviews.llvm.org/D32199#732189, @hfinkel wrote: > > > In https://reviews.llvm.org/D32199#731472, @rsmith wrote: > > > > > 1. C's "effective type" rule allows writes to set the type pretty much > > > unconditionally, unless the storage is for a variable with a declared type > > > > > > To come back to this point: We don't really implement these rules now, and > > it is not clear that we will. The problem here is that, if we take the > > specification literally, then we can't use our current TBAA at all. The > > problem is that if we have: > > > > write x, !tbaa "int" > > read x, !tbaa "int" > > write x, !tbaa "float" > > > > > > TBAA will currently tell us that the "float" write aliases with neither the > > preceding read nor the preceding write. > > > Right, C's TBAA rules do not (in general) permit a store to be reordered > before a memory operation of a different type, they only allow loads to be > moved before stores. (Put another way, they do not tell you that pointers > point to distinct memory locations, just that a stored value cannot be > observed by a load of a different type.) You get the more general "distinct > memory locations" result only for objects of a declared type. > > C++ is similar, except that (because object lifetimes do not currently begin > magically due to a store) you /can/ reorder stores past a memory operation of > a different type if you know no object's lifetime began in between. (But > currently we do not record all lifetime events in IR, so we can't do that > today. Also, we may be about to lose the property that you can statically > determine a small number of places that might start an object lifetime.) > > > Also, a strict reading of C's access rules seems to rule out the premise > > underlying our struct-path TBAA entirely. So long as I'm accessing a value > > using a struct that has some member, including recursively, with that type, > > then it's fine. The matching of the relative offsets is a sufficient, but > > not necessary, condition for well-defined access. C++ has essentially the > > same language (and, thus, potentially the same problem). > > I agree this rule is garbage, but it's not as permissive as I think you're > suggesting. The rule says that you can use an lvalue of struct type to access > memory of struct field type. In C this happens during struct assignment, for > instance. It does *not* permit using an lvalue of struct field type to access > unrelated fields of the same struct. So C appears to allow this nonsense: > > char *p = malloc(8); > *(int*)p = 0; > *(int*)(p + 4) = 0; > struct S {int n; float f;} s = *(struct S*)p; // use lvalue of type `struct > S` to access object of effective type `int`, to initialize a `float` > > > but not this nonsense: > > float q = ((struct S*)p)->f; // ub, cannot use lvalue of type `float` to > access object of effective type `int` > > > ... which just means that we can't make much use of TBAA when emitting struct > copies in C. > > In C++, on the other hand, the rule is even more garbage, since there is no > way to perform a memory access with a glvalue of class type. (The closest you > get is that a defaulted union construction/assignment copies the object > representation, but that's expressed in terms of copying a sequence of > unsigned chars, and in any case those are member functions and so already > require an object of the correct type to exist.) See wg21.link/cwg2051 Our struct-path TBAA does the following: struct X { int a, b; }; X x { 50, 100 }; X *o = (X*) (((int*) &x) + 1); int a_is_b = o->a; // This is UB (or so we say)? Because we assume that the (type, offset) tuples are identified entities in the type-aliasing tree. Practically speaking, this certainly makes sense to me. However, I don't see anything in the language that actually forbids this behavior. In case it matters, because in the above case the type of the struct actually matches, we similarly forbid: struct X { int a, b; }; struct Y { int a; float b; }; X x { 50, 100 }; Y *o = (X*) (((int*) &x) + 1); int a_is_b = o->a; // This is UB (or so we say)? as is this: struct X { int a, b; }; struct Y { int a; float b; X h; /* in case this matters for the aggregate members thing */ }; X x { 50, 100 }; Y *o = (X*) (((int*) &x) + 1); int a_is_b = o->a; // This is UB (or so we say)? (although, as you say, this shouldn't matter in C++ because we don't have struct glvalues) In any case, am I missing something? > > >> While I'd like the sanitizer to follow the rules, that's really useful only >> to the extent that the compilers follow the rules. If the compilers are >> making stronger assumptions, then I think that the sanitizer should also. > > I agree. > >>> If we want to follow the relevant language rules by default, that would >>> suggest that "writes always set the type" should be enabled by default in C >>> and disabled by default in C++. That may not be the right decision for >>> other reasons, though. In C++, writes through union members and >>> new-expressions should probably (re)set the type (do you have intrinsics >>> the frontend can use to do so?). >> >> Also, in C, memcpy gets to copy the type for a destination that does not >> have declared types. It looks like the same is true for C++ for >> trivially-copyable types. Is the first read/write sets the unknown type >> (i.e. memory from malloc/calloc/memset, etc.) correct for C++ also? > > As I recall, "store can create an object" is the broad direction that SG12 > agreed on for the cases where you have a pointer into a raw storage buffer > (that is, a char array), and we want the low-level storage allocation > functions to give us such a buffer. What about a read after a calloc (or memset)? https://reviews.llvm.org/D32199 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits