Hi Stefan,

On 1/20/23 11:06, Stefan Puiu wrote:
Hi Alex,

On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar
<alx.manpa...@gmail.com> wrote:

Hi!

I just received a report about struct sockaddr_storage in the man pages.  It
reminded me of some concern I've always had about it: it doesn't seem to be a
usable type.

It has some alignment promises that make it "just work" most of the time, but
it's still a UB mine, according to ISO C.

According to strict aliasing rules, if you declare a variable of type 'struct
sockaddr_storage', that's what you get, and trying to access it later as some
other sockaddr_8 is simply not legal.  The compiler may assume those accesses
can't happen, and optimize as it pleases.

Can you detail the "is not legal" part?

I mean that it's Undefined Behavior contraband.

How about the APIs like
connect() etc that use pointers to struct sockaddr, where the
underlying type is different, why would that be legal while using
sockaddr_storage isn't?

That's also bad. However, it can be fixed by fixing `sockaddr_storage` and telling everyone to use it instead of using whatever other `sockaddr_*`. You need a union for the underlying storage, so that the library functions can access both as `sockaddr` and as `sockaddr_*`.

The problem isn't really in the implementation of connect(2), but on the type. The implementation of connect(2) would be fine if we just fixed the type. See some example:

struct my_sockaddr_storage {
        union {
                sa_family_t          ss_family;
                struct sockaddr      sa;
                struct sockaddr_in   sin;
                struct sockaddr_in6  sin6;
                struct sockaddr_un   sun;
        };
};


void
foo(foo)
{
        struct my_sockaddr_storage  mss;
        struct sockaddr_storage     ss;

        // initialize mss and ss

        inet_sockaddr2str(&mss.sa);  // correct
        inet_sockaddr2str((struct sockaddr_storage *)&ss);  // UB
}

/* This function is correct, as far as the accessed object has the
 * type we're using.  That's only possible through a `union`, since
 * we're accessing it with 2 different types: `sockaddr` for the
 * `sa_family` and then the appropriate subtype for the address
 * itself.
 */
const char *
inet_sockaddr2str(const struct sockaddr *sa)
{
        struct sockaddr_in   *sin;
        struct sockaddr_in6  *sin6;

        static char          buf[INET_ADDRSTRLENMAX];

        switch (sa->sa_family) {
        case AF_INET:
                sin = (struct sockaddr_in *) sa;
                inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf));
                return buf;
        case AF_INET6:
                sin6 = (struct sockaddr_in6 *) sa;
                inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf));
                return buf;
        default:
                errno = EAFNOSUPPORT;
                return NULL;
        }
}


BTW, you need a union _even if_ you only care about a single address family. That is, if you only care about Unix sockets, you can't declare your variable of type sockaddr_un, because the libc functions and syscalls still need to access it as a sockaddr to see which family it has.

Will code break in practice?

Well, it depends on how much compilers advance.  Here's some interesting 
experiment:

<https://software.codidact.com/posts/287748/287750#answer-287750>

I wouldn't rely on Undefined Behavior not causing nasal demons. When you get them, you can only kill them with garlic.



That means that one needs to declare a union with all possible sockaddr_* types
that are of interest, so that access as any of them is later allowed by the
compiler (of course, the user still needs to access the correct one, but that's
of course).

In that union, one could add a member that is of type sockaddr_storage for
getting a more consistent structure size (for example, if some members are
conditional on preprocessor stuff), but I don't see much value in that.
Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source 
code:

   * struct sockaddr_storage is:
   *    128 bytes on Linux, FreeBSD, MacOSX, NetBSD;
   *    256 bytes on Solaris, OpenBSD, and HP-UX;
   *   1288 bytes on AIX.
   *
   * struct sockaddr_storage is too large on some platforms
   * or less than real maximum struct sockaddr_un length.

Which makes it even more useless as a type.

I'm not sure using struct sockaddr_storage for storing sockaddr_un's
(UNIX domain socket addresses, right?) is that common a usage. I've
used it in the past to store either a sockaddr_in or a sockaddr_in6,
and I think that would be a more common scenario. The comment above
probably makes sense for nginx, but different projects have different
needs.

As for the size, I guess it might matter if you want to port your code
to AIX, Solaris, OpenBSD etc. I don't think all software is meant to
be portable, though (or portable to those platforms). Maybe a warning
is in order that, for portable code, developers should check its size
on the other platforms targeted.

The size thing is just an added problem. The deep problem is that you need to use a union that contains all types that you care about _plus_ plain sockaddr, because the structure will be accessed at least as a sockaddr, plus one of the different specialized structures. So even for only sockaddr_un, you need at least the following:

union my_unix_sockaddr {
        struct sockaddr     sa;
        struct sockaddr_un  sun;
};

Not doing that will necessarily result in invoking Undefined Behavior at some 
point.


Just my 2 cents, as always,
Stefan.

The good thing is that fixing sockaddr_storage and telling everybody to use it always fixes the problem, so I'm preparing a patch for glibc.

Cheers,

Alex




Should we warn about uses of this type?  Should we recommend against using it in
the manual page, since there's no legitimate uses of it?

Cheers,

Alex

--
<http://www.alejandro-colomar.es/>

--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to