Hi Peter, first, thanks for bringing this here.
On Tue, Jul 20, 2021 at 01:13:58AM -0500, Peter Jin wrote: > 1. The network namespace support seems to be a bit broken. In the function > "my_socketat" (lines 114-129 of src/namespace.c in the latest dev branch), > you attempt to first change network namespace to the desired namespace, and > then change back to the default namespace. While this is correct, this does > not work in two cases, both involving user namespaces: > > * First, HAProxy could be running in a non-initial user namespace with a > full set of capabilities (including CAP_SYS_ADMIN), but the network > namespace is still associated with the initial user namespace. Such an > environment could be simulated with "unshare -r" (omitting -n), or by using > a container runtime that supported user namespaces but the network namespace > is still associated with the host (e.g. docker run --net=host, if it were > supported in userns-remap mode). In this case, the first setns() would > succeed, but the setns() back to the original namespace would fail because > HAProxy would not have the CAP_SYS_ADMIN capability in the original network > namespace (it is owned by the initial user namespace). Interesting. However, this will return a socket error, so the problem will be detected. In the case of a listener, if the user loses CAP_SYS_ADMIN then this will fail during startup and it will be visible. I'm more concerned about the risk of runtime failure when connecting to a particular namespace. The real problem I'm seeing is that there is no boot-time check on this to verify that it still works after the return to the temporary namespace. At the very least we could try during boot, for each server-side namespace, to enter and leave their namespace and verify that it doesn't fail, otherwise we'd quit. > To mitigate this, > HAProxy would need to fork a new process, call setns() and create socket in > the new process, and then transfer the socket back to the original process > using SCM_RIGHTS (you can probably reuse the code in proto_sockpair.c or > some other file mentioning SCM_RIGHTS to do that). Unfortunately that's really not workable, it would cause terrible latencies that are in no way compatible with HTTP usages. It *might* work for listeners but not for outgoing connections, and whatever solution we'll find to those will work on both sides. > * Second, HAProxy could be running as a non-root user, and at least one > "rootless" container with a separated network namespace exists for that > user. It would be nice if HAProxy could create a socket in such a network > namespace without root privileges. Judging by what I already see in the > code, that does not seem to be possible as it currently stands. While we strongly encourage against running as root, I can indeed imagine that it can be an issue with setns(). > The solution > to solve this case is identical to the first case; the only difference is > that you also have to enter the associated user namespace first (hint: you > can use the NS_GET_USERNS ioctl on the target network namespace to obtain a > file descriptor to that user namespace, which you can pass to setns()) and > set PR_SET_DUMPABLE to 0 before entering the user namespace for security. > > These techniques have already been employed in software like "slirp4netns", > which creates a TUN/TAP device in a given network namespace, and handles > both of the above cases correctly. The only difference is that for HAProxy, > we should be creating a socket instead, but the overall technique is still > the same. Yes but while that's probably totally affordable in terms of latency for the creation of a tunnel, it definitely is not to create an outgoing socket. We're speaking about tens to hundreds of microseconds in the very best case, probably even more sometimes. One intermediate possibility might be to drop everything but CAP_SYS_ADMIN but even then it leaves a lot of capabilities to an attacker :-/ I really think that the deepest problem is that there's still no userland-friendly way to do socketat() without going through all that mess :-/ The permissions ought to be checked once and it should be possible to just create a socket on the same namespace as another one designated from an FD that already passed the permission checks. Maybe it's about time to reload the old discussions around that API that ended exactly 10 years ago :-/ > Another complaint about the network namespace support is that it only > supports namespaces in /var/run/netns. My own tool (search for "ctrtool > ns_open_file" on google), on the other hand, support network namespaces > created in arbitrary locations (and even allows creating sockets in > arbitrary namespaces that also account for the above two user namespace > scenarios). It would be nice if HAProxy supported arbitrary network > namespace locations too, to support the rootless container use case. I have no opinion on this and didn't even remember about this path. I agree that it would make sense to have a tunable in the global section to change this! Are you interested in proposing a patch to do this ? > 2. There is a stack buffer overflow found in one of the files. Not > disclosing it here because this email will end up on the public mailing > list. If there is a "security" email address I could disclose it to, what is > it? As Lukas mentioned, it's [email protected], and indeed it's not prominent enough. > 3. There was another feature that I felt was really broken, but since it is > related to #2 (it's associated with the same file that the stack buffer > overflow exists on), I'm not disclosing it here publicly either. (The issue > itself has nothing to do with security, but I will only disclose this after > #2 has been resolved.) No problem! Thanks! Willy

