https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217
--- Comment #11 from Akihiko Odaki <akihiko.odaki at daynix dot com> --- (In reply to Jakub Jelinek from comment #9) > (In reply to Akihiko Odaki from comment #8) > > It would certainly workaround the issue, but it's only dirtier and brings no > > benefit except suppressed UBSan errors. Why not allow > > get_unaligned(&entry->offset) when UBSan does not complain about that hack? > > The purpose of the sanitizer is to find undefined behavior, and it found one. > If you don't want it to find undefined behavior, don't use it. > > > > If you want something that will be valid even in C, don't pass struct > > > dir_entry *entry > > > argument, but void *entry instead, and use e.g. > > > __get_unaligned_t(__typeof(((struct dir_entry *)0)->offset), ((char > > > *)entry)+offsetof(struct dir_entry, offset))) > > > You can surely hide that all under some macro. > > > > The definition still involves UB for ((struct dir_entry *)0)->offset. > > Perhaps __typeof() may be considered as an exception, but what if offsetof() > > is defined as follows? > > > > #define offsetof(T, x) ((uintptr_t)&(((T *)0)->x)) > > typeof nor sizeof operators don't evaluate their arguments unless the > expressions have VLA type. So, if there would be undefined behavior if they > were evaluated, it doesn't matter because they aren't. So, e.g. > > void > foo (int n, int *p) > { > typedef int A[n]; > A b[4]; > typedef int C[10]; > C d[4]; > int i = 0; > int j = 0; > typeof (b[++i]) e; > typeof (d[++j]) f; > typeof (d[100][4]) g; > p[0] = i; > p[1] = j; > } > > doesn't invoke undefined behavior and sets p[0] to 1 and p[1] to 0, b[++i] > had VLA type, so it got evaluated, the other 2 didn't, so ++j wasn't > evaluated and the d[100] > UB wasn't triggered, as if the program did if (n == 54) d[100][4]; and n at > runtime was > never 54. __typeof is a GNU extension but behaves the same way. > > offsetof is a standard C macro. It is not defined in the standard as > dereferencing NULL pointer, but as if there was an object of the T type and > it computed the difference between the address of the x member of it and the > start of the object. > If offsetof is defined as > #define offsetof(T, x) ((size_t)(uintptr_t)&(((T *)0)->x)) > by the implementation (and yes, e.g. GCC < 4 used to define it similarly), > then obviously the compiler can't assume there is undefined behavior on such > expressions, > but it didn't at that point, there was no UBSAN, such expressions evaluated > to a constant expression early before anything could have considered there > might be UB and after it was just the resulting constant. > If the implementation doesn't have some exception that considers expressions > like > &(((T *)0)->x) valid, then it can't use such offsetof definition and needs > to use something else, like GCC uses __builtin_offsetof. That said, GCC > AFAIK still treats expressions like the above as offsetof-like (because > there has been a lot of such code in the wild for decades), folds them to > constant right away and doesn't complain about it with ubsan; clang does > complain about it though and doesn't treat it that way. > Anyway, if you use something like ((size_t)(uintptr_t)&(((T *)0)->x)), per C > rules you are invoking UB, while if you use offsetof(T, x), it is well > defined. So there are two constructs invoking UBs but ignored by UBSan: 1) construction of misaligned pointers that do not involve a struct member, and 2) pointer arithmetic with a member of an invalid struct pointer as done by old offsetof(). If UBSan catches get_unaligned(&entry->offset), why not catch those constructs too? Perhaps 2) may not be relevant for UBSan since it's done at compile time, but 1) is. Whether 1) should be caught is a more complicated question. From a certain viewpoint, it should be caught since it's UB, and future GCC and other compilers may introduce incompatible optimization; there will be real bugs if that happens. On the other hand, the current GCC does not perform such an optimization, and clang does not either at least when a packed struct is used or __builtin_memcpy() is used and its argument is casted to void * or char * (c.f., https://github.com/llvm/llvm-project/issues/83710). In that sense, catching 1) brings only noises. In any case, the answer to the question of whether 1) should be caught should not be different from whether get_unaligned(&entry->offset) should be caught.