Hi Alex, On Fri, Jan 20, 2023 at 2:40 PM Alejandro Colomar <alx.manpa...@gmail.com> wrote: > > Hi Stefan, > > On 1/20/23 11:06, Stefan Puiu wrote: > > Hi Alex, > > > > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar > > <alx.manpa...@gmail.com> wrote: > >> > >> Hi! > >> > >> I just received a report about struct sockaddr_storage in the man pages. > >> It > >> reminded me of some concern I've always had about it: it doesn't seem to > >> be a > >> usable type. > >> > >> It has some alignment promises that make it "just work" most of the time, > >> but > >> it's still a UB mine, according to ISO C. > >> > >> According to strict aliasing rules, if you declare a variable of type > >> 'struct > >> sockaddr_storage', that's what you get, and trying to access it later as > >> some > >> other sockaddr_8 is simply not legal. The compiler may assume those > >> accesses > >> can't happen, and optimize as it pleases. > > > > Can you detail the "is not legal" part? > > I mean that it's Undefined Behavior contraband.
OK, next question. Is this theoretical or practical UB? People check documentation about how to write code today, I think. > > > How about the APIs like > > connect() etc that use pointers to struct sockaddr, where the > > underlying type is different, why would that be legal while using > > sockaddr_storage isn't? > > That's also bad. However, it can be fixed by fixing `sockaddr_storage` and > telling everyone to use it instead of using whatever other `sockaddr_*`. You > need a union for the underlying storage, so that the library functions can > access both as `sockaddr` and as `sockaddr_*`. > > The problem isn't really in the implementation of connect(2), but on the type. > The implementation of connect(2) would be fine if we just fixed the type. See > some example: > > struct my_sockaddr_storage { > union { > sa_family_t ss_family; > struct sockaddr sa; > struct sockaddr_in sin; > struct sockaddr_in6 sin6; > struct sockaddr_un sun; > }; > }; > > > void > foo(foo) > { > struct my_sockaddr_storage mss; > struct sockaddr_storage ss; > > // initialize mss and ss > > inet_sockaddr2str(&mss.sa); // correct > inet_sockaddr2str((struct sockaddr_storage *)&ss); // UB > } > > /* This function is correct, as far as the accessed object has the > * type we're using. That's only possible through a `union`, since > * we're accessing it with 2 different types: `sockaddr` for the > * `sa_family` and then the appropriate subtype for the address > * itself. > */ > const char * > inet_sockaddr2str(const struct sockaddr *sa) > { > struct sockaddr_in *sin; > struct sockaddr_in6 *sin6; > > static char buf[INET_ADDRSTRLENMAX]; > > switch (sa->sa_family) { > case AF_INET: > sin = (struct sockaddr_in *) sa; > inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf)); > return buf; > case AF_INET6: > sin6 = (struct sockaddr_in6 *) sa; > inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf)); > return buf; > default: > errno = EAFNOSUPPORT; > return NULL; > } > } > > > BTW, you need a union _even if_ you only care about a single address family. > That is, if you only care about Unix sockets, you can't declare your variable > of > type sockaddr_un, because the libc functions and syscalls still need to access > it as a sockaddr to see which family it has. > > > Will code break in practice? > > Well, it depends on how much compilers advance. Here's some interesting > experiment: > > <https://software.codidact.com/posts/287748/287750#answer-287750> That code plays with 2 pointers to the same area, one to double and one to int, so I don't think it's that similar to the sockaddr situation. At least for struct sockaddr, the sa_family field is the same for all struct sockaddr_* variants. Also, in practical terms, I don't think any compiler optimization that breaks socket APIs (and, if I recall correctly, there are instances of this pattern in the kernel as well) is going to be an easy sell. It's possible, but realistically speaking, I don't think it's going to happen. > > I wouldn't rely on Undefined Behavior not causing nasal demons. When you get > them, you can only kill them with garlic. OK, but not all theoretical issues have practical implications. Is there code that can show UB in practical terms with struct sockaddr_storage today? Like Eric mentioned in another thread, does UBSan complain about code using struct sockaddr_storage? Thanks, Stefan. > > > > >> > >> That means that one needs to declare a union with all possible sockaddr_* > >> types > >> that are of interest, so that access as any of them is later allowed by the > >> compiler (of course, the user still needs to access the correct one, but > >> that's > >> of course). > >> > >> In that union, one could add a member that is of type sockaddr_storage for > >> getting a more consistent structure size (for example, if some members are > >> conditional on preprocessor stuff), but I don't see much value in that. > >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's > >> source code: > >> > >> * struct sockaddr_storage is: > >> * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD; > >> * 256 bytes on Solaris, OpenBSD, and HP-UX; > >> * 1288 bytes on AIX. > >> * > >> * struct sockaddr_storage is too large on some platforms > >> * or less than real maximum struct sockaddr_un length. > >> > >> Which makes it even more useless as a type. > > > > I'm not sure using struct sockaddr_storage for storing sockaddr_un's > > (UNIX domain socket addresses, right?) is that common a usage. I've > > used it in the past to store either a sockaddr_in or a sockaddr_in6, > > and I think that would be a more common scenario. The comment above > > probably makes sense for nginx, but different projects have different > > needs. > > > > As for the size, I guess it might matter if you want to port your code > > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to > > be portable, though (or portable to those platforms). Maybe a warning > > is in order that, for portable code, developers should check its size > > on the other platforms targeted. > > The size thing is just an added problem. The deep problem is that you need to > use a union that contains all types that you care about _plus_ plain sockaddr, > because the structure will be accessed at least as a sockaddr, plus one of the > different specialized structures. So even for only sockaddr_un, you need at > least the following: > > union my_unix_sockaddr { > struct sockaddr sa; > struct sockaddr_un sun; > }; > > Not doing that will necessarily result in invoking Undefined Behavior at some > point. > > > > > Just my 2 cents, as always, > > Stefan. > > The good thing is that fixing sockaddr_storage and telling everybody to use it > always fixes the problem, so I'm preparing a patch for glibc. > > Cheers, > > Alex > > > > >> > >> > >> Should we warn about uses of this type? Should we recommend against using > >> it in > >> the manual page, since there's no legitimate uses of it? > >> > >> Cheers, > >> > >> Alex > >> > >> -- > >> <http://www.alejandro-colomar.es/> > > -- > <http://www.alejandro-colomar.es/>