On 02/12/2015 01:41 AM, Andres Freund wrote:
Hi,

On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:
On 01/26/2015 12:14 PM, Andres Freund wrote:
Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
    SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
    again with the same parameters. Unfortunately that rule doesn't mean
    just that the same parameters have to be passed in, but also that we
    can't just constantly switch between _read()/write(). Especially
    nonblocking backend code (i.e. walsender) and the whole frontend code
    violate this rule.

I don't buy that. Googling around, and looking at the man pages, I just
don't see anything to support that claim. I believe it is supported to
switch between reads and writes.

There's at least two implementations that seem to have workaround to
achieve something like that. Apache's mod_ssl and the ssl layer for
libevent.

2) We start renegotiations in be_tls_write() while in nonblocking mode,
    but don't properly retry to handle socket readyness. There's a loop
    that retries handshakes twenty times (???), but what actually is
    needed is to call SSL_get_error() and ensure that there's actually
    data available.

Yeah, that's just crazy.

Actually, I don't think we need to call SSL_do_handshake() at all. At least
on modern OpenSSL versions. OpenSSL internally uses a state machine, and
SSL_read() and SSL_write() calls will complete the handshake just as well.

Some blaming seems to show that that's been the case for a long time.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

Yeah, I can reproduce that. There's clearly something wrong.

I believe this is an OpenSSL bug. I traced the "unexpected message" error to
this code in OpenSSL's s3_read_bytes() function:

Yep, I got to that as well.

    case SSL3_RT_APPLICATION_DATA:
        /*
         * At this point, we were expecting handshake data, but have
         * application data.  If the library was running inside ssl3_read()
         * (i.e. in_read_app_data is set) and it makes sense to read
         * application data at this point (session renegotiation not yet
         * started), we will indulge it.
         */
        if (s->s3->in_read_app_data &&
            (s->s3->total_renegotiations != 0) &&
            (((s->state & SSL_ST_CONNECT) &&
              (s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
              (s->state <= SSL3_ST_CR_SRVR_HELLO_A)
             ) || ((s->state & SSL_ST_ACCEPT) &&
                   (s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
                   (s->state >= SSL3_ST_SR_CLNT_HELLO_A)
             )
            )) {
            s->s3->in_read_app_data = 2;
            return (-1);
        } else {
            al = SSL_AD_UNEXPECTED_MESSAGE;
            SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
            goto f_err;
        }

So that quite clearly throws an error, *unless* the original application
call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
fault and it should "indulge" the application data message even in
SSL_write().

That'd be nice, but I think there were multiple reports to them about
this and there wasn't much of a response. Maybe it's time to try again.

In theory, I guess we should do similar hacks in the server, and always call
SSL_read() before SSL_write(), but it seems to work without it. Or maybe
not; OpenSSL server and client code is not symmetric, so perhaps it works in
server mode without that.

I think we should do it on the server side - got some server side errors
when I cranked up the pg_receivexlog's status interval and set the wal
sender timeout very low.

I've not been able to reproduce any errors on the server side, with my latest client-only patch. Not sure why. I would assume that the server has the same problem, but perhaps there are some mitigating factors that make it impossible or at least less likely there.

I'm planning to commit and backpatch the first two of the attached patches. The first is essentially the same as what I've posted before, with some minor cleanup. I realized that the hack can be isolated completely in fe-secure-openssl.c by putting the "kill-switch" in the custom BIO_read routine, instead of pqsecure_raw_read(), so I did that. The second patch makes pqSendSome() call pqReadData() also in non-blocking mode. Per discussion, that's necessary to avoid getting stuck, if an application that uses non-blocking mode doesn't call pqConsumeInput() between pqFlush() calls.

For the sake of completeness, the third one applies the same kill-switch hack to the server, preventing SSL_write() from reading anything. However, as it doesn't seem to actually be necessary, and it happens to be a bit more messy to implement in the backend than in libpq, I'm not going to commit that one.

- Heikki

From 0c59577de27ec636026370ddf1ad9df1190883f5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 5 Feb 2015 22:44:58 +0200
Subject: [PATCH 1/3] Fix "sslv3 alert unexpected message" errors in SSL
 renegotiation.

There seems to be a bug in OpenSSL renegotiation, so that when SSL_write()
needs to read data to complete the handshake, but it receives application
data instead, it gets confused and throws an "unexpected message" error. To
work around that, don't let SSL_write() to read data from the socket, even
if some is available. Arrange so that SSL_read() processes all incoming
messages instead.

Reviewed by Andres Freund

diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 945e283..9dd02d5 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -920,11 +920,15 @@ pqSendSome(PGconn *conn, int len)
 			 * to clear the channel eventually because it's blocked trying to
 			 * send data to us.  (This can happen when we are sending a large
 			 * amount of COPY data, and the server has generated lots of
-			 * NOTICE responses.)  To avoid a deadlock situation, we must be
-			 * prepared to accept and buffer incoming data before we try
-			 * again.  Furthermore, it is possible that such incoming data
-			 * might not arrive until after we've gone to sleep.  Therefore,
-			 * we wait for either read ready or write ready.
+			 * NOTICE responses.)  Another scenario is that SSL renegotiation
+			 * is in progress, and the SSL library needs to read a message
+			 * to complete the handshake, before it can send more data.
+			 *
+			 * To avoid a deadlock situation, we must be prepared to accept
+			 * and buffer incoming data before we try again.  Furthermore, it
+			 * is possible that such incoming data might not arrive until
+			 * after we've gone to sleep.  Therefore, we wait for either read
+			 * ready or write ready.
 			 */
 			if (pqReadData(conn) < 0)
 			{
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..93b8184 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -78,6 +78,7 @@ static int my_sock_write(BIO *h, const char *buf, int size);
 static BIO_METHOD *my_BIO_s_socket(void);
 static int my_SSL_set_fd(PGconn *conn, int fd);
 
+static bool my_block_raw_read = false;
 
 static bool pq_init_ssl_lib = true;
 static bool pq_init_crypto_lib = true;
@@ -319,8 +320,28 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 	char		sebuf[256];
 	int			err;
 
+	/*
+	 * To work-around an issue with OpenSSL and renegotiation, don't let
+	 * SSL_write() read any incoming data.
+	 *
+	 * If SSL_write() is called, and renegotiation has just been inititated,
+	 * SSL_write() might try to read from the socket to complete the
+	 * handshake. If there was some application data in-flight, it might
+	 * receive the application data instead. That confuses it, and it throws
+	 * an "sslv3 alert unexpected message" error.
+	 *
+	 * We avoid that by setting a kill-switch, my_block_raw_read, which tells
+	 * my_sock_read() to not return any data to the caller, even if some is
+	 * available.
+	 *
+	 * NB: This relies on the calling code to call pqsecure_read(), completing
+	 * the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
+	 * we'll never make progress.
+	 */
 	SOCK_ERRNO_SET(0);
+	my_block_raw_read = true;
 	n = SSL_write(conn->ssl, ptr, len);
+	my_block_raw_read = false;
 	err = SSL_get_error(conn->ssl, n);
 	switch (err)
 	{
@@ -1591,6 +1612,15 @@ my_sock_read(BIO *h, char *buf, int size)
 	int			res;
 	int			save_errno;
 
+	/* If the kill-switch is set, pretend that there is no data. */
+	if (my_block_raw_read)
+	{
+		errno = EWOULDBLOCK;
+		BIO_clear_retry_flags(h);
+		BIO_set_retry_read(h);
+		return -1;
+	}
+
 	res = pqsecure_raw_read((PGconn *) h->ptr, buf, size);
 	save_errno = errno;
 	BIO_clear_retry_flags(h);
-- 
2.1.4

From b03488c0fb28b74a490f164b04705a24f759e779 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 12 Feb 2015 15:37:19 +0200
Subject: [PATCH 2/3] Also drain input buffer in non-blocking mode if send
 fails with EAGAIN.

In blocking mode, pqSendSome() calls pqReadData() to read any NOTICEs and
such that the server may have sent, before sleeping. That avoids a deadlock
e.g. in COPY FROM STDIN, because the server might block on sending the
NOTICEs and not receive our data until we read the NOTICEs. But
non-blocking mode has exactly the same issue, so let's call pqReadData() in
non-blocking mode too.

Reviewed by Andres Freund

diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 9dd02d5..f0ab6cd 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -905,16 +905,6 @@ pqSendSome(PGconn *conn, int len)
 			/*
 			 * We didn't send it all, wait till we can send more.
 			 *
-			 * If the connection is in non-blocking mode we don't wait, but
-			 * return 1 to indicate that data is still pending.
-			 */
-			if (pqIsnonblocking(conn))
-			{
-				result = 1;
-				break;
-			}
-
-			/*
 			 * There are scenarios in which we can't send data because the
 			 * communications channel is full, but we cannot expect the server
 			 * to clear the channel eventually because it's blocked trying to
@@ -935,6 +925,17 @@ pqSendSome(PGconn *conn, int len)
 				result = -1;	/* error message already set up */
 				break;
 			}
+
+			/*
+			 * If the connection is in non-blocking mode we don't wait, but
+			 * return 1 to indicate that data is still pending.
+			 */
+			if (pqIsnonblocking(conn))
+			{
+				result = 1;
+				break;
+			}
+
 			if (pqWait(TRUE, TRUE, conn))
 			{
 				result = -1;
-- 
2.1.4

From 0cf438fca1d00d41f7358bd140346fa813d9ab1e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 12 Feb 2015 18:58:01 +0200
Subject: [PATCH 3/3] WIP: Do the same block-raw-read dance in backend.

This is more complicated than in the client, because the callers of
secure_write() will not automatically do a read, if the write fails.
Need to expose pq_recvbuf() and call it from secure_write(), which is
modularity violation.

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 37af6e4..4a20d80 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -93,6 +93,9 @@ static const char *SSLerrmessage(void);
 /* are we in the middle of a renegotiation? */
 static bool in_ssl_renegotiation = false;
 
+/* kill-switch to have my_sock_read pretend there's no data */
+static bool my_block_raw_read = false;
+
 static SSL_CTX *SSL_context = NULL;
 
 /* ------------------------------------------------------------ */
@@ -602,7 +605,9 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 	}
 
 	errno = 0;
+	my_block_raw_read = true;
 	n = SSL_write(port->ssl, ptr, len);
+	my_block_raw_read = false;
 	err = SSL_get_error(port->ssl, n);
 	switch (err)
 	{
@@ -695,19 +700,28 @@ static BIO_METHOD my_bio_methods;
 static int
 my_sock_read(BIO *h, char *buf, int size)
 {
-	int			res = 0;
+	int			res;
 
-	if (buf != NULL)
+	if (buf == NULL)
+		return 0;	/* XXX: can this happen? */
+
+	/* If the kill-switch is set, pretend that there is no data. */
+	if (my_block_raw_read)
 	{
-		res = secure_raw_read(((Port *)h->ptr), buf, size);
+		errno = EWOULDBLOCK;
 		BIO_clear_retry_flags(h);
-		if (res <= 0)
+		BIO_set_retry_read(h);
+		return -1;
+	}
+
+	res = secure_raw_read(((Port *)h->ptr), buf, size);
+	BIO_clear_retry_flags(h);
+	if (res <= 0)
+	{
+		/* If we were interrupted, tell caller to retry */
+		if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
 		{
-			/* If we were interrupted, tell caller to retry */
-			if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
-			{
-				BIO_set_retry_read(h);
-			}
+			BIO_set_retry_read(h);
 		}
 	}
 
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 4e7acbe..e7bc52b 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -245,6 +245,9 @@ retry:
 			 * for the socket to become ready again.
 			 */
 		}
+		if (waitfor == WL_SOCKET_READABLE)
+			pq_recvbuf(false);
+
 		goto retry;
 	}
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 34efac4..bedc19f 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -843,11 +843,11 @@ socket_set_nonblocking(bool nonblocking)
 /* --------------------------------
  *		pq_recvbuf - load some bytes into the input buffer
  *
- *		returns 0 if OK, EOF if trouble
+ *		returns 0 if OK, EOF if	trouble
  * --------------------------------
  */
-static int
-pq_recvbuf(void)
+int
+pq_recvbuf(bool block)
 {
 	if (PqRecvPointer > 0)
 	{
@@ -863,8 +863,8 @@ pq_recvbuf(void)
 			PqRecvLength = PqRecvPointer = 0;
 	}
 
-	/* Ensure that we're in blocking mode */
-	socket_set_nonblocking(false);
+	/* Ensure that we're in blocking mode (or not) */
+	socket_set_nonblocking(!block);
 
 	/* Can fill buffer from PqRecvLength and upwards */
 	for (;;)
@@ -879,6 +879,12 @@ pq_recvbuf(void)
 			if (errno == EINTR)
 				continue;		/* Ok if interrupted */
 
+			if (!block)
+			{
+				if (errno == EWOULDBLOCK || errno == EAGAIN)
+					return 0;
+			}
+
 			/*
 			 * Careful: an ereport() that tries to write to the client would
 			 * cause recursion to here, leading to stack overflow and core
@@ -914,7 +920,7 @@ pq_getbyte(void)
 
 	while (PqRecvPointer >= PqRecvLength)
 	{
-		if (pq_recvbuf())		/* If nothing in buffer, then recv some */
+		if (pq_recvbuf(true))		/* If nothing in buffer, then recv some */
 			return EOF;			/* Failed to recv data */
 	}
 	return (unsigned char) PqRecvBuffer[PqRecvPointer++];
@@ -933,7 +939,7 @@ pq_peekbyte(void)
 
 	while (PqRecvPointer >= PqRecvLength)
 	{
-		if (pq_recvbuf())		/* If nothing in buffer, then recv some */
+		if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 			return EOF;			/* Failed to recv data */
 	}
 	return (unsigned char) PqRecvBuffer[PqRecvPointer];
@@ -1012,7 +1018,7 @@ pq_getbytes(char *s, size_t len)
 	{
 		while (PqRecvPointer >= PqRecvLength)
 		{
-			if (pq_recvbuf())	/* If nothing in buffer, then recv some */
+			if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 				return EOF;		/* Failed to recv data */
 		}
 		amount = PqRecvLength - PqRecvPointer;
@@ -1046,7 +1052,7 @@ pq_discardbytes(size_t len)
 	{
 		while (PqRecvPointer >= PqRecvLength)
 		{
-			if (pq_recvbuf())	/* If nothing in buffer, then recv some */
+			if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 				return EOF;		/* Failed to recv data */
 		}
 		amount = PqRecvLength - PqRecvPointer;
@@ -1087,7 +1093,7 @@ pq_getstring(StringInfo s)
 	{
 		while (PqRecvPointer >= PqRecvLength)
 		{
-			if (pq_recvbuf())	/* If nothing in buffer, then recv some */
+			if (pq_recvbuf(true))	/* If nothing in buffer, then recv some */
 				return EOF;		/* Failed to recv data */
 		}
 
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index af4ba2a..152e30e 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -1,3 +1,4 @@
+
 /*-------------------------------------------------------------------------
  *
  * libpq.h
@@ -86,6 +87,7 @@ extern int	pq_getbyte(void);
 extern int	pq_peekbyte(void);
 extern int	pq_getbyte_if_available(unsigned char *c);
 extern int	pq_putbytes(const char *s, size_t len);
+extern int	pq_recvbuf(bool block);
 
 /*
  * prototypes for functions in be-secure.c
-- 
2.1.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to