Hello everyone,

During the process of porting latest changes of OpenBSD's relayd to FreeBSD, I stumbled upon an interesting issue. One of them is that the HCE subprocess of relayd gets a SIGSEGV, brining down all the other relayd subprocesses. Here's how it looks like on the command-line:

```
# relayd  -vd
startup
ca exiting, pid 83388
ca exiting, pid 84099
ca exiting, pid 84794
relay exiting, pid 82084
pfe exiting, pid 81193
relay exiting, pid 82913
relay exiting, pid 82352
lost child: pid 81722 terminated; signal 11
parent terminating, pid 80914
```

And here's the output of `dwatch -X proc` (a handy wrapper around DTrace), where you can see the SIGSEGV killing the HCE subprocess:

```
2022 Jun  2 11:20:48 0.0 relayd[55780]: INIT relayd -P ca -I 2 -vd
2022 Jun  2 11:20:48 913.913 relayd[52326]: SEND SIGSEGV[11] pid 52326 -- 
relayd: hce
2022 Jun  2 11:20:48 913.913 relayd[52326]: EXIT child terminated abnormally
2022 Jun  2 11:20:48 913.913 relayd[53774]: EXIT child exited
2022 Jun  2 11:20:48 913.913 relayd[52326]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[52326]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[51783]: EXIT child exited
2022 Jun  2 11:20:48 913.913 relayd[53774]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[53774]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[51783]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[54135]: EXIT child exited
2022 Jun  2 11:20:48 913.913 relayd[51783]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[54135]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[54281]: EXIT child exited
2022 Jun  2 11:20:48 913.913 relayd[54135]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[54281]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[53164]: EXIT child exited
2022 Jun  2 11:20:48 913.913 relayd[54281]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[53164]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[55780]: EXIT child exited
2022 Jun  2 11:20:48 913.913 relayd[53164]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[55780]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[55780]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[54959]: EXIT child exited
2022 Jun  2 11:20:48 913.913 relayd[54959]: SEND SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 913.913 relayd[54959]: DISCARD SIGCHLD[20] pid 51541 -- 
relayd -vd
2022 Jun  2 11:20:48 0.0 relayd[51541]: EXIT child exited
2022 Jun  2 11:20:48 0.0 relayd[51541]: SEND SIGCHLD[20] pid 7946 -- /bin/sh -i
2022 Jun  2 11:20:48 0.0 relayd[51541]: DISCARD SIGCHLD[20] pid 7946 -- /bin/sh 
-i
```

Here's my relayd.conf:

```
prefork 3

table <localhost-ssl> {
127.0.0.1
}

redirect "localhost-ssl" {
listen on 192.168.2.1 tcp port 9999
forward to <localhost-ssl> port 4443 mode roundrobin  check https 
"/rules.limits" host acs-areq code 200
}
```

Interestingly, if I replaced "https" with "http", relayd does not crash upon 
startup anymore.

If we dig deeper, it turns out that a call to `tls_config_new()` in hce.c fails and the return value is never checked. A patch could look like this:

```
diff --git a/src/usr.sbin/relayd/hce.c b/src/usr.sbin/relayd/hce.c
index 5233e2c..4a1bf1c 100644
--- a/src/usr.sbin/relayd/hce.c
+++ b/src/usr.sbin/relayd/hce.c
@@ -94,6 +94,9 @@ hce_setup_events(void)
                            table->tls_cfg != NULL)
                                continue;
                        table->tls_cfg = tls_config_new();
+                       if (table->tls_cfg == NULL) {
+                               fatalx("%s: tls_config_new", __func__);
+                       }
tls_config_insecure_noverifycert(table->tls_cfg);
tls_config_insecure_noverifyname(table->tls_cfg);
                }
```

The reason why `tls_config_new()` failed was that `pthread_once` would return non-zero in `tls_init()`. What's the reason for that? Well, I didn't pass `-pthread` to relayd's build system, which seemed to cause relayd to default to a `pthread_once` stub in the FreeBSD libc, which simply returns ENOSYS. I realized that the missing `-pthread` flag was the issue as I was preparing this issue report. I'll make sure that the FreeBSD Ports Collection sets the flags right when the time comes for the update.

Here are the versions of software I'm using:

- LibreSSL: 3.4.3
- relayd: https://github.com/swills/relayd/tree/openbsd_catchup_202203 (synced with OpenBSD's e363f310b89, according to the commit log in the swills/relayd repo).
- FreeBSD 13.1

I'll be happy to provide further details if necessary.


Best regards,

Mateusz Piotrowski

Reply via email to