Re: relayd stops processing traffic intermittently

2017-12-28 Thread Sebastian Benoit
ok

Claudio Jeker(cje...@diehard.n-r-g.com) on 2017.12.28 14:21:27 +0100:
> Forwarding to tech, since people may have missed this.
> Hope to commit this soon.
> 
> - Forwarded message from Claudio Jeker <cje...@diehard.n-r-g.com> -
> 
> Date: Sun, 24 Dec 2017 19:07:11 +0100
> From: Claudio Jeker
> To: Mischa Peters
> Subject: Re: relayd stops processing traffic intermittently
> 
> On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
> > 
> > > On 23 Dec 2017, at 13:08, Claudio Jeker wrote:
> > > 
> > >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
> > >> Hi All,
> > >> 
> > >> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
> > >> (GENERIC.MP#305) as well, for some reason relayd stops processing 
> > >> traffic and starts flooding the log file with the following message:
> > >> 
> > >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> > >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> > >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> > >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> > >> [snip]
> > >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> > >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> > >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> > >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> > >> ...etc...
> > >> 
> > >> Restarting the daemon "fixes" the problem.
> > >> Not sure how to trouble shoot this but I am able to reproduce this 
> > >> consistently by pointing SSLLabs towards relayd.
> > >> Would be great to get some pointers.
> > >> 
> > > 
> > > I have seen this as well on our production systems. This is a problem in
> > > the privsep part of the TLS code. I could not do more testing yet but my
> > > assumption is that a new option / feature is freaking this code out.
> > 
> > Anything I can do or collect to give you more information? 
> 
> So, I think I found the problem. The ca process did not handle errors from
> RSA_private_encrypt correctly. So once you got a bad signature in the
> system chocked and stopped. This diff seems to work for me (against
> SSLlabs).
> 
> Cheers
> -- 
> :wq Claudio
> 
> Index: ca.c
> ===
> RCS file: /cvs/src/usr.sbin/relayd/ca.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 ca.c
> --- ca.c  28 Nov 2017 00:20:23 -  1.31
> +++ ca.c  24 Dec 2017 18:01:20 -
> @@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep
>   break;
>   }
>  
> + if (cko.cko_tlen == -1) {
> + char buf[256];
> + log_warnx("%s: %s", __func__,
> + ERR_error_string(ERR_get_error(), buf));
> + }
> +
>   iov[c].iov_base = 
>   iov[c++].iov_len = sizeof(cko);
> - if (cko.cko_tlen) {
> + if (cko.cko_tlen > 0) {
>   iov[c].iov_base = to;
>   iov[c++].iov_len = cko.cko_tlen;
>   }
> @@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f
>  
>   IMSG_SIZE_CHECK(, ());
>   memcpy(, imsg.data, sizeof(cko));
> - if (IMSG_DATA_SIZE() !=
> - (sizeof(cko) + cko.cko_tlen))
> - fatalx("data size");
>  
>   ret = cko.cko_tlen;
> - if (ret) {
> + if (ret > 0) {
> + if (IMSG_DATA_SIZE() !=
> + (sizeof(cko) + ret))
> + fatalx("data size");
>   toptr = (u_char *)imsg.data + sizeof(cko);
>   memcpy(to, toptr, ret);
>   }
> 
> 
> - End forwarded message -
> 
> -- 
> :wq Claudio
> 



Re: relayd stops processing traffic intermittently

2017-12-28 Thread Claudio Jeker
Forwarding to tech, since people may have missed this.
Hope to commit this soon.

- Forwarded message from Claudio Jeker <cje...@diehard.n-r-g.com> -

Date: Sun, 24 Dec 2017 19:07:11 +0100
From: Claudio Jeker
To: Mischa Peters
Subject: Re: relayd stops processing traffic intermittently

On Sat, Dec 23, 2017 at 02:04:19PM +0100, Mischa Peters wrote:
> 
> > On 23 Dec 2017, at 13:08, Claudio Jeker wrote:
> > 
> >> On Sat, Dec 23, 2017 at 11:40:57AM +0100, Mischa wrote:
> >> Hi All,
> >> 
> >> Since OpenBSD 6.2, just confirmed this in the latest snapshot 
> >> (GENERIC.MP#305) as well, for some reason relayd stops processing traffic 
> >> and starts flooding the log file with the following message:
> >> 
> >> Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> [snip]
> >> Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
> >> Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
> >> ...etc...
> >> 
> >> Restarting the daemon "fixes" the problem.
> >> Not sure how to trouble shoot this but I am able to reproduce this 
> >> consistently by pointing SSLLabs towards relayd.
> >> Would be great to get some pointers.
> >> 
> > 
> > I have seen this as well on our production systems. This is a problem in
> > the privsep part of the TLS code. I could not do more testing yet but my
> > assumption is that a new option / feature is freaking this code out.
> 
> Anything I can do or collect to give you more information? 

So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).

Cheers
-- 
:wq Claudio

Index: ca.c
===
RCS file: /cvs/src/usr.sbin/relayd/ca.c,v
retrieving revision 1.31
diff -u -p -r1.31 ca.c
--- ca.c28 Nov 2017 00:20:23 -  1.31
+++ ca.c24 Dec 2017 18:01:20 -
@@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep
break;
}
 
+   if (cko.cko_tlen == -1) {
+   char buf[256];
+   log_warnx("%s: %s", __func__,
+   ERR_error_string(ERR_get_error(), buf));
+   }
+
iov[c].iov_base = 
iov[c++].iov_len = sizeof(cko);
-   if (cko.cko_tlen) {
+   if (cko.cko_tlen > 0) {
iov[c].iov_base = to;
iov[c++].iov_len = cko.cko_tlen;
}
@@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f
 
IMSG_SIZE_CHECK(, ());
memcpy(, imsg.data, sizeof(cko));
-   if (IMSG_DATA_SIZE() !=
-   (sizeof(cko) + cko.cko_tlen))
-   fatalx("data size");
 
ret = cko.cko_tlen;
-   if (ret) {
+   if (ret > 0) {
+   if (IMSG_DATA_SIZE() !=
+   (sizeof(cko) + ret))
+   fatalx("data size");
toptr = (u_char *)imsg.data + sizeof(cko);
memcpy(to, toptr, ret);
}


- End forwarded message -

-- 
:wq Claudio