Hi Alex,

On Fri, Jan 20, 2023 at 2:40 PM Alejandro Colomar
<alx.manpa...@gmail.com> wrote:
>
> Hi Stefan,
>
> On 1/20/23 11:06, Stefan Puiu wrote:
> > Hi Alex,
> >
> > On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
> > <alx.manpa...@gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> I just received a report about struct sockaddr_storage in the man pages.  
> >> It
> >> reminded me of some concern I've always had about it: it doesn't seem to 
> >> be a
> >> usable type.
> >>
> >> It has some alignment promises that make it "just work" most of the time, 
> >> but
> >> it's still a UB mine, according to ISO C.
> >>
> >> According to strict aliasing rules, if you declare a variable of type 
> >> 'struct
> >> sockaddr_storage', that's what you get, and trying to access it later as 
> >> some
> >> other sockaddr_8 is simply not legal.  The compiler may assume those 
> >> accesses
> >> can't happen, and optimize as it pleases.
> >
> > Can you detail the "is not legal" part?
>
> I mean that it's Undefined Behavior contraband.

OK, next question. Is this theoretical or practical UB? People check
documentation about how to write code today, I think.

>
> > How about the APIs like
> > connect() etc that use pointers to struct sockaddr, where the
> > underlying type is different, why would that be legal while using
> > sockaddr_storage isn't?
>
> That's also bad.  However, it can be fixed by fixing `sockaddr_storage` and
> telling everyone to use it instead of using whatever other `sockaddr_*`.  You
> need a union for the underlying storage, so that the library functions can
> access both as `sockaddr` and as `sockaddr_*`.
>
> The problem isn't really in the implementation of connect(2), but on the type.
> The implementation of connect(2) would be fine if we just fixed the type.  See
> some example:
>
> struct my_sockaddr_storage {
>         union {
>                 sa_family_t          ss_family;
>                 struct sockaddr      sa;
>                 struct sockaddr_in   sin;
>                 struct sockaddr_in6  sin6;
>                 struct sockaddr_un   sun;
>         };
> };
>
>
> void
> foo(foo)
> {
>         struct my_sockaddr_storage  mss;
>         struct sockaddr_storage     ss;
>
>         // initialize mss and ss
>
>         inet_sockaddr2str(&mss.sa);  // correct
>         inet_sockaddr2str((struct sockaddr_storage *)&ss);  // UB
> }
>
> /* This function is correct, as far as the accessed object has the
>   * type we're using.  That's only possible through a `union`, since
>   * we're accessing it with 2 different types: `sockaddr` for the
>   * `sa_family` and then the appropriate subtype for the address
>   * itself.
>   */
> const char *
> inet_sockaddr2str(const struct sockaddr *sa)
> {
>         struct sockaddr_in   *sin;
>         struct sockaddr_in6  *sin6;
>
>         static char          buf[INET_ADDRSTRLENMAX];
>
>         switch (sa->sa_family) {
>         case AF_INET:
>                 sin = (struct sockaddr_in *) sa;
>                 inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf));
>                 return buf;
>         case AF_INET6:
>                 sin6 = (struct sockaddr_in6 *) sa;
>                 inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf));
>                 return buf;
>         default:
>                 errno = EAFNOSUPPORT;
>                 return NULL;
>         }
> }
>
>
> BTW, you need a union _even if_ you only care about a single address family.
> That is, if you only care about Unix sockets, you can't declare your variable 
> of
> type sockaddr_un, because the libc functions and syscalls still need to access
> it as a sockaddr to see which family it has.
>
> > Will code break in practice?
>
> Well, it depends on how much compilers advance.  Here's some interesting 
> experiment:
>
> <https://software.codidact.com/posts/287748/287750#answer-287750>

That code plays with 2 pointers to the same area, one to double and
one to int, so I don't think it's that similar to the sockaddr
situation. At least for struct sockaddr, the sa_family field is the
same for all struct sockaddr_* variants. Also, in practical terms, I
don't think any compiler optimization that breaks socket APIs (and, if
I recall correctly, there are instances of this pattern in the kernel
as well) is going to be an easy sell. It's possible, but realistically
speaking, I don't think it's going to happen.

>
> I wouldn't rely on Undefined Behavior not causing nasal demons.  When you get
> them, you can only kill them with garlic.

OK, but not all theoretical issues have practical implications. Is
there code that can show UB in practical terms with struct
sockaddr_storage today? Like Eric mentioned in another thread, does
UBSan complain about code using struct sockaddr_storage?

Thanks,
Stefan.

>
> >
> >>
> >> That means that one needs to declare a union with all possible sockaddr_* 
> >> types
> >> that are of interest, so that access as any of them is later allowed by the
> >> compiler (of course, the user still needs to access the correct one, but 
> >> that's
> >> of course).
> >>
> >> In that union, one could add a member that is of type sockaddr_storage for
> >> getting a more consistent structure size (for example, if some members are
> >> conditional on preprocessor stuff), but I don't see much value in that.
> >> Especially, given this comment that Igor Sysoev wrote in NGINX Unit's 
> >> source code:
> >>
> >>    * struct sockaddr_storage is:
> >>    *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
> >>    *    256 bytes on Solaris, OpenBSD, and HP-UX;
> >>    *   1288 bytes on AIX.
> >>    *
> >>    * struct sockaddr_storage is too large on some platforms
> >>    * or less than real maximum struct sockaddr_un length.
> >>
> >> Which makes it even more useless as a type.
> >
> > I'm not sure using struct sockaddr_storage for storing sockaddr_un's
> > (UNIX domain socket addresses, right?) is that common a usage. I've
> > used it in the past to store either a sockaddr_in or a sockaddr_in6,
> > and I think that would be a more common scenario. The comment above
> > probably makes sense for nginx, but different projects have different
> > needs.
> >
> > As for the size, I guess it might matter if you want to port your code
> > to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
> > be portable, though (or portable to those platforms). Maybe a warning
> > is in order that, for portable code, developers should check its size
> > on the other platforms targeted.
>
> The size thing is just an added problem.  The deep problem is that you need to
> use a union that contains all types that you care about _plus_ plain sockaddr,
> because the structure will be accessed at least as a sockaddr, plus one of the
> different specialized structures.  So even for only sockaddr_un, you need at
> least the following:
>
> union my_unix_sockaddr {
>         struct sockaddr     sa;
>         struct sockaddr_un  sun;
> };
>
> Not doing that will necessarily result in invoking Undefined Behavior at some 
> point.
>
> >
> > Just my 2 cents, as always,
> > Stefan.
>
> The good thing is that fixing sockaddr_storage and telling everybody to use it
> always fixes the problem, so I'm preparing a patch for glibc.
>
> Cheers,
>
> Alex
>
> >
> >>
> >>
> >> Should we warn about uses of this type?  Should we recommend against using 
> >> it in
> >> the manual page, since there's no legitimate uses of it?
> >>
> >> Cheers,
> >>
> >> Alex
> >>
> >> --
> >> <http://www.alejandro-colomar.es/>
>
> --
> <http://www.alejandro-colomar.es/>

Reply via email to