Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-25 Thread Martin Pihlak
On 07/24/2011 11:33 PM, Tom Lane wrote:
 I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER)
 as well as a patch to improve the error reporting situation.
 

Cool that this turned out to be a one-line fix. Thanks!

regards,
Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-24 Thread Martin Pihlak
On 07/16/2011 12:46 AM, Tom Lane wrote:
 I think the direction to move in ought to be to use the existing buffer
 as-is, and have pqCheckOutBufferSpace refuse to enlarge the buffer while
 we are in write frozen state.  It should be OK to append data to the
 buffer, though, so long as we remember how much we're allowed to pass to
 SSL_write when we next try to write.

Alternative to freezing the outBuffer would be to set
SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode during SSL initialisation.
That would enable the buffer address to be changed in-between the
SSL_write calls, so long as the content remains the same. Attached
is a patch that uses the single buffer approach described by Tom, but
with a moving SSL write buffer enabled.

Modifying pqCheckOutBufferSpace is also an option, but it'd break some
(arguably already broken) client applications that don't have proper
retry handling. Notably some versions of psycopg2 have problems with
handling zero return values from PQputCopyData. So ISTM that from
backporting perspective the moving write buffer is a bit safer.

I'll see if I can come up with a test case for the SSL_read retry before
attempting to fix that.

regards,
Martin
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 17dde4a..977e37c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -793,20 +793,26 @@ pqSendSome(PGconn *conn, int len)
 	while (len  0)
 	{
 		int			sent;
+		int			write_size = len;
 		char		sebuf[256];
 
-#ifndef WIN32
-		sent = pqsecure_write(conn, ptr, len);
-#else
-
+#ifdef WIN32
 		/*
 		 * Windows can fail on large sends, per KB article Q201213. The
 		 * failure-point appears to be different in different versions of
 		 * Windows, but 64k should always be safe.
 		 */
-		sent = pqsecure_write(conn, ptr, Min(len, 65536));
+		write_size = Min(len, 65536);
+#endif
+
+#ifdef USE_SSL
+		/* We have leftovers from previous SSL write, send these first. */
+		if (conn-ssl_retry_bytes)
+			write_size = conn-ssl_retry_bytes;
 #endif
 
+		sent = pqsecure_write(conn, ptr, write_size);
+
 		if (sent  0)
 		{
 			/*
@@ -857,11 +863,30 @@ pqSendSome(PGconn *conn, int len)
 	return -1;
 			}
 		}
+#ifdef USE_SSL
+		else if (sent == 0)
+		{
+			/*
+			 * We are in non-blocking mode, set up for retrying a SSL write
+			 * operation. OpenSSL requires the same buffer to be passed to the
+			 * write retry, so we must remember the write size. However, we
+			 * don't need to keep the original buffer address as the 
+			 * SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode ought to be set.
+			 */
+			if (!conn-ssl_retry_bytes)
+conn-ssl_retry_bytes = write_size;
+		}
+#endif
 		else
 		{
 			ptr += sent;
 			len -= sent;
 			remaining -= sent;
+#ifdef USE_SSL
+			/* Forget the retry buffer size if all the data has been sent */
+			if (conn-ssl_retry_bytes)
+conn-ssl_retry_bytes = 0;
+#endif
 		}
 
 		if (len  0)
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index cd1292c..540561d 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -757,6 +757,9 @@ init_ssl_system(PGconn *conn)
 #endif
 			return -1;
 		}
+
+		/* Enable moving write buffer to simplify write retry handling. */
+		SSL_CTX_set_mode(SSL_context, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
 	}
 
 #ifdef ENABLE_THREAD_SAFETY
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index d56ef5d..ae212a6 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -407,6 +407,8 @@ struct pg_conn
 	X509	   *peer;			/* X509 cert of server */
 	char		peer_dn[256 + 1];		/* peer distinguished name */
 	char		peer_cn[SM_USER + 1];	/* peer common name */
+
+	int			ssl_retry_bytes;	/* buffer size for SSL write retry */
 #ifdef USE_SSL_ENGINE
 	ENGINE	   *engine;			/* SSL engine, if any */
 #else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-24 Thread Tom Lane
Martin Pihlak martin.pih...@gmail.com writes:
 On 07/16/2011 12:46 AM, Tom Lane wrote:
 I think the direction to move in ought to be to use the existing buffer
 as-is, and have pqCheckOutBufferSpace refuse to enlarge the buffer while
 we are in write frozen state.  It should be OK to append data to the
 buffer, though, so long as we remember how much we're allowed to pass to
 SSL_write when we next try to write.

 Alternative to freezing the outBuffer would be to set
 SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode during SSL initialisation.

Hmmm ... that is a really interesting flag.  The documentation is
inadequate, but I googled a bit and found this thread about it:
http://www.mail-archive.com/openssl-users@openssl.org/msg45440.html
in which it's stated that (1) the prohibition on moving the buffer is
actually just a sort of sanity check, and can be turned off using this
flag; (2) it is okay to increase, but not reduce, the length parameter
to SSL_write in the next call after a WANT_READ/WANT_WRITE return.

So this does look like it'd fix the issue for SSL_write, without needing
to introduce a concept of a write frozen buffer state.  I am wondering
though how far back support for this flag exists in OpenSSL, and whether
the situation is the same for SSL_read or not.  I don't see any
SSL_MODE_ACCEPT_MOVING_READ_BUFFER macro ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-24 Thread Tom Lane
I wrote:
 So this does look like it'd fix the issue for SSL_write, without needing
 to introduce a concept of a write frozen buffer state.  I am wondering
 though how far back support for this flag exists in OpenSSL,

A bit of archaeology reveals that the flag was introduced in OpenSSL
0.9.4, released in 1999.  So it's probably Old Enough.  (BTW, the 0.9.4
changelog credits this change to one Bodo Moeller ... so the comments
from him in the other thread I linked to can be considered authoritative
...)

Still wondering about the SSL_read end of it, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-24 Thread Tom Lane
I wrote:
 Still wondering about the SSL_read end of it, though.

And on that front, some digging around in the OpenSSL source code
indicates that they do all their work in internal buffers, and transfer
data into SSL_read's result buffer only when ready to return it.
So the claim in the documentation that SSL_read has a restriction
comparable to SSL_write is a lie: there is no case where they'll copy
some data into the buffer and then return -1.

So the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER solution looks like a good
fix.  I'll see about applying it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-24 Thread Tom Lane
Martin Pihlak martin.pih...@gmail.com writes:
 If possible I would like the fix to be backported as well. This is
 quite a nasty bug and difficult to track down. Especially as the
 actual SSL error messages are masked by the server closed the
 connection unexpectedly message.

I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER)
as well as a patch to improve the error reporting situation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-24 Thread Peter Geoghegan
On 24 July 2011 21:33, Tom Lane t...@sss.pgh.pa.us wrote:
 I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER)
 as well as a patch to improve the error reporting situation.

I'm not exactly sure why, and don't have time to investigate right
now, but this commit (fee476da952a1f02f7ccf6e233fb4824c2bf6af4)
appears to have broken the build for me:

gcc -O0 -g -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wformat-security
-fno-strict-aliasing -fwrapv -g -pthread  -D_REENTRANT -D_THREAD_SAFE
-D_POSIX_PTHREAD_SEMANTICS -fpic -DFRONTEND -DUNSAFE_STAT_OK -I.
-I../../../src/include -D_GNU_SOURCE  -I../../../src/port
-I../../../src/port -DSO_MAJOR_VERSION=5  -c -o fe-misc.o fe-misc.c
-MMD -MP -MF .deps/fe-misc.Po
fe-misc.c: In function ‘pqReadData’:
fe-misc.c:651:11: error: ‘PGconn’ has no member named ‘ssl’
fe-misc.c:743:11: error: ‘PGconn’ has no member named ‘ssl’
fe-misc.c:761:10: error: ‘PGconn’ has no member named ‘ssl’
fe-misc.c: In function ‘pqSendSome’:
fe-misc.c:841:14: error: ‘PGconn’ has no member named ‘ssl’
fe-misc.c:861:14: error: ‘PGconn’ has no member named ‘ssl’

The problem goes away if I check out the commit made immediately prior
to this one - d0c23026b2499ba9d6797359241ade076a5a677d. I'm building
with my usual, rather generic settings for hacking.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-24 Thread Andrew Dunstan



On 07/24/2011 07:49 PM, Peter Geoghegan wrote:

On 24 July 2011 21:33, Tom Lanet...@sss.pgh.pa.us  wrote:

I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER)
as well as a patch to improve the error reporting situation.

I'm not exactly sure why, and don't have time to investigate right
now, but this commit (fee476da952a1f02f7ccf6e233fb4824c2bf6af4)
appears to have broken the build for me:




Yeah, looks like it fails unless --with-openssl is set.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 8, 2011 at 9:36 AM, Martin Pihlak martin.pih...@gmail.com wrote:
 On 07/03/2011 05:08 AM, Steve Singer wrote:
 Since the original patch was submitted as a WIP patch and this version
 wasn't sent until well into the commit fest I am not sure if it
 qualifies for a committer during this commitfest or if it needs to wait
 until the next one.

 If possible I would like the fix to be backported as well. This is
 quite a nasty bug and difficult to track down. Especially as the
 actual SSL error messages are masked by the server closed the
 connection unexpectedly message.

 I would not be inclined to back-patch this straight away.  I agree
 it's an important fix, but I am a little worried about the chances of
 breaking something else...  then again, I don't have the only vote
 here.

I reviewed this patch a bit.  I agree that we have a bug here that
should be fixed and backported, but I don't think this patch is ready.
Some problems:

1. The behavior under low-memory conditions is pretty intolerable.
As coded, a malloc failure here:

+conn-outBufSize = Max(16 * 1024, remaining);
+conn-outBuffer = malloc(conn-outBufSize);
+if (conn-outBuffer == NULL)
+{
+printfPQExpBuffer(conn-errorMessage,
+  cannot allocate memory for output 
buffer\n);
+return -1;
+}

leaves libpq's internal state completely corrupt --- outBuffer is null,
which will result in instant coredump on any future access, but that's
just the tip of the iceberg because we've also lost buffered data and
messed up the none-too-clearly-defined-anyway state of which pending
data is in which buffer.

In general, I don't think it's a smart idea to proceed by duplicating
the buffer contents in this situation, particularly not if the most
obvious way to cause the problem is to have a very large buffer :-(.
I think the direction to move in ought to be to use the existing buffer
as-is, and have pqCheckOutBufferSpace refuse to enlarge the buffer while
we are in write frozen state.  It should be OK to append data to the
buffer, though, so long as we remember how much we're allowed to pass to
SSL_write when we next try to write.

2. According to the OpenSSL man pages, *both* SSL_read and SSL_write are
defined to need to be re-called with the identical arguments after a
WANT_READ or WANT_WRITE return.  This means that pqReadData() is also at
risk for this type of bug.  I believe it could only happen in code paths
where we call pqReadData when there is already some data in the buffer:
if the caller then consumes some of that data, the next call to
pqReadData would try to compact the buffer contents before making the
pqsecure_read call.  I think we probably need a read frozen state in
which we won't enlarge or compact the inBuffer until SSL_read succeeds.

3. As of 9.0, somebody's decided that the backend needs nonblock read
semantics in some cases.  I probably should have complained harder about
that before it went in, but since it's in, the backend is at risk for
this same type of issue in secure_read().

I will mark the patch Returned With Feedback.  I think that we need to
address all these issues before we consider applying it.  If we weren't
hard up against the end of the CommitFest I might have a go at it
myself, but I can't justify the time right now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-08 Thread Martin Pihlak
On 07/03/2011 05:08 AM, Steve Singer wrote:
 Since the original patch was submitted as a WIP patch and this version
 wasn't sent until well into the commit fest I am not sure if it
 qualifies for a committer during this commitfest or if it needs to wait
 until the next one.
 

If possible I would like the fix to be backported as well. This is
quite a nasty bug and difficult to track down. Especially as the
actual SSL error messages are masked by the server closed the
connection unexpectedly message.

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-08 Thread Robert Haas
On Fri, Jul 8, 2011 at 9:36 AM, Martin Pihlak martin.pih...@gmail.com wrote:
 On 07/03/2011 05:08 AM, Steve Singer wrote:
 Since the original patch was submitted as a WIP patch and this version
 wasn't sent until well into the commit fest I am not sure if it
 qualifies for a committer during this commitfest or if it needs to wait
 until the next one.

 If possible I would like the fix to be backported as well. This is
 quite a nasty bug and difficult to track down. Especially as the
 actual SSL error messages are masked by the server closed the
 connection unexpectedly message.

I would not be inclined to back-patch this straight away.  I agree
it's an important fix, but I am a little worried about the chances of
breaking something else...  then again, I don't have the only vote
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-07-02 Thread Steve Singer

On 11-06-28 02:14 PM, Martin Pihlak wrote:

Thanks for the review!
I have since simplified the patch to assume that partial SSL writes are
disabled -- according to SSL_write(3) this is the default behaviour.
Now the SSL retry buffer only holds the data to be retried, the
remainder is moved to the new outBuffer.



That sounds okay.  Does it make sense to add in a check to verify that 
SSL didn't send a partial write?  I don't know how bad openssl is about 
changing default behaviours or if we are concerned about protecting 
against someone changing the SSL parameters.  My inclination is that 
this isn't needed but I'll raise the issue.

Fixed.

New version of the patch attached.



Otherwise this version of the patch looks good to me.

The only testing I have done is running the test program you sent 
earlier on in the thread and verified that the regression tests all 
pass.  Other than something like your test program I'm not sure how else 
this bug can be induced.


Since the original patch was submitted as a WIP patch and this version 
wasn't sent until well into the commit fest I am not sure if it 
qualifies for a committer during this commitfest or if it needs to wait 
until the next one.







regards,
Martin







Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-30 Thread Steve Singer

On 11-06-28 02:14 PM, Martin Pihlak wrote:


Hmm, I thought I thought about that. There was a check in the original
patch: if (conn-sslRetryBytes || (conn-outCount - remaining)  0)
So if the SSL retry buffer was emptied it would return 1 if there was
something left in the regular output buffer. Or did I miss something
there?



The issue I saw in the original patch was that at that point in the 
function, sslRetryBytes could be zero (if the data was sent) but  
conn-outCount - remaining would be an incorrect value since remaining 
is the number of bytes left to send from sslRetryBuf NOT 
conn-outBuffer.   This is no longer an issue in the updated patch.

I will try to take a closer look at your updated patch in the next few days.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-28 Thread Martin Pihlak
On 06/25/2011 12:14 AM, Steve Singer wrote:
 I'm not a libpq guru but I've given your patch a look over.
 

Thanks for the review!

I have since simplified the patch to assume that partial SSL writes are
disabled -- according to SSL_write(3) this is the default behaviour.
Now the SSL retry buffer only holds the data to be retried, the
remainder is moved to the new outBuffer.

 -The patch doesn't compile with non-ssl builds,  the debug at the bottom
 of PQSendSome isn't in an #ifdef
 

Ack, there was another one in pqFlush. Fixed that.

 -I don't think your handling the return code properly.   Consider this case.
 
 pqSendSome(some data)
   sslRetryBuf = some Data
   return 1
 pqSendSome(more data)
   it sends all of 'some data'
   returns 0
 
 I think 1 should be returned because all of 'more data' still needs to
 be sent.  I think returning a 0 will break PQsetnonblocking if you call
 it when there is data in both sslRetryBuf and outputBuffer.

Hmm, I thought I thought about that. There was a check in the original
patch: if (conn-sslRetryBytes || (conn-outCount - remaining)  0)
So if the SSL retry buffer was emptied it would return 1 if there was
something left in the regular output buffer. Or did I miss something
there?

 We might even want to try sending the data in outputBuffer after we've
 sent all the data sitting in sslRetryBuf.
 

IMHO that'd add unnecessary complexity to the pqSendSome. As this only
happens in non-blocking mode the caller should be well prepared to
handle the retry.

 If you close the connection with an outstanding sslRetryBuf you need to
 free it.

Fixed.

New version of the patch attached.

regards,
Martin
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 9e4807e..8dc0226 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2792,6 +2792,10 @@ freePGconn(PGconn *conn)
 	if (conn-gsslib)
 		free(conn-gsslib);
 #endif
+#if defined(USE_SSL)
+	if (conn-sslOutBufPtr)
+		free(conn-sslOutBufPtr);
+#endif
 	/* Note that conn-Pfdebug is not ours to close or free */
 	if (conn-last_query)
 		free(conn-last_query);
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 17dde4a..d0c7812 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -781,6 +781,8 @@ pqSendSome(PGconn *conn, int len)
 	char	   *ptr = conn-outBuffer;
 	int			remaining = conn-outCount;
 	int			result = 0;
+	int			sent = 0;
+	bool		shiftOutBuffer = true;
 
 	if (conn-sock  0)
 	{
@@ -789,23 +791,34 @@ pqSendSome(PGconn *conn, int len)
 		return -1;
 	}
 
+#ifdef USE_SSL
+	if (conn-sslRetryPos)
+	{
+		/* 
+		 * We have some leftovers from previous SSL write attempts, these need
+		 * to be sent before the regular output buffer.
+		 */
+		len = remaining = conn-sslRetrySize;
+		ptr = conn-sslRetryPos;
+		shiftOutBuffer = false;
+	}
+#endif
+
 	/* while there's still data to send */
 	while (len  0)
 	{
-		int			sent;
 		char		sebuf[256];
+		int			write_size = len;
 
-#ifndef WIN32
-		sent = pqsecure_write(conn, ptr, len);
-#else
-
+#ifdef WIN32
 		/*
 		 * Windows can fail on large sends, per KB article Q201213. The
 		 * failure-point appears to be different in different versions of
 		 * Windows, but 64k should always be safe.
 		 */
-		sent = pqsecure_write(conn, ptr, Min(len, 65536));
+		write_size = Min(len, 65536);
 #endif
+		sent = pqsecure_write(conn, ptr, write_size);
 
 		if (sent  0)
 		{
@@ -857,6 +870,38 @@ pqSendSome(PGconn *conn, int len)
 	return -1;
 			}
 		}
+#ifdef USE_SSL
+		else if (sent == 0  conn-ssl  !conn-sslRetryPos)
+		{
+			/*
+			 * pqsecure_write indicates that a SSL write needs to be retried.
+			 * With non-blocking connections we need to ensure that the buffer
+			 * passed to the SSL_write() retry is the the exact same buffer as
+			 * in previous write -- see SSL_write(3SSL) for more on this. For
+			 * this we need to save the the original buffer and allocate a new
+			 * buffer for outgoing data.
+			 */
+			conn-sslOutBufPtr = conn-outBuffer;
+			conn-sslRetryPos = ptr;
+			conn-sslRetrySize = write_size;
+
+			/*
+			 * Allocate a new output buffer and prepare to move the remainder
+			 * of the original outBuffer there.
+			 */
+			remaining -= write_size;
+			ptr += write_size;
+
+			conn-outBufSize = Max(16 * 1024, remaining);
+			conn-outBuffer = malloc(conn-outBufSize);
+			if (conn-outBuffer == NULL)
+			{
+printfPQExpBuffer(conn-errorMessage,
+  cannot allocate memory for output buffer\n);
+return -1;
+			}
+		}
+#endif
 		else
 		{
 			ptr += sent;
@@ -903,10 +948,36 @@ pqSendSome(PGconn *conn, int len)
 		}
 	}
 
-	/* shift the remaining contents of the buffer */
-	if (remaining  0)
-		memmove(conn-outBuffer, ptr, remaining);
-	conn-outCount = remaining;
+#ifdef USE_SSL
+	if (conn-sslRetryPos  sent  0)
+	{
+		/*
+		 * A SSL write was successfully retried, free the retry buffer and
+		 * switch to the regular outBuffer. 

Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-27 Thread Robert Haas
On Fri, Jun 24, 2011 at 5:14 PM, Steve Singer ssinger...@sympatico.ca wrote:
 A few things I noticed (that you might be aware of since you mentioned it
 needs cleanup)

 -The patch doesn't compile with non-ssl builds,  the debug at the bottom of
 PQSendSome isn't in an #ifdef

 -I don't think your handling the return code properly.   Consider this case.

 pqSendSome(some data)
   sslRetryBuf = some Data
   return 1
 pqSendSome(more data)
   it sends all of 'some data'
   returns 0

 I think 1 should be returned because all of 'more data' still needs to be
 sent.  I think returning a 0 will break PQsetnonblocking if you call it when
 there is data in both sslRetryBuf and outputBuffer.
 We might even want to try sending the data in outputBuffer after we've sent
 all the data sitting in sslRetryBuf.


 If you close the connection with an outstanding sslRetryBuf you need to free
 it.

Based on these comments, I have updated the status of the patch to
Waiting on Author.

https://commitfest.postgresql.org/action/patch_view?id=594

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-24 Thread Steve Singer

On 11-06-15 03:20 PM, Martin Pihlak wrote:


Yes, that sounds like a good idea -- especially considering that COPY
is not the only operation that can cause SSL_write retries.


This is of course still work in progress, needs cleaning up and
definitely more testing. But at this point before going any further,
I'd really appreciate a quick review from resident libpq gurus.



Martin,

I'm not a libpq guru but I've given your patch a look over.

The idea of storing the original buffer in new members of connection 
structure for later re-use seems like a good way to approach this.


A few things I noticed (that you might be aware of since you mentioned 
it needs cleanup)


-The patch doesn't compile with non-ssl builds,  the debug at the bottom 
of PQSendSome isn't in an #ifdef


-I don't think your handling the return code properly.   Consider this case.

pqSendSome(some data)
  sslRetryBuf = some Data
  return 1
pqSendSome(more data)
  it sends all of 'some data'
  returns 0

I think 1 should be returned because all of 'more data' still needs to 
be sent.  I think returning a 0 will break PQsetnonblocking if you call 
it when there is data in both sslRetryBuf and outputBuffer.
We might even want to try sending the data in outputBuffer after we've 
sent all the data sitting in sslRetryBuf.



If you close the connection with an outstanding sslRetryBuf you need to 
free it.



Other than running your test program I didn't do any testing with this 
patch.


Steve


regards,
Martin







Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-15 Thread Martin Pihlak
On 06/12/2011 04:22 AM, Robert Haas wrote:
 One idea is that we could add outBuffer2/outBufSize2 to struct
 pg_conn, or something along those lines with less obviously stupid
 naming.  Normally those would be unused, but in the special case where
 SSL indicates that we must retry the call with the same arguments, we
 set a flag that freezes the out buffer and forces any further data
 to be stuffed into outBuffer2.  If or when the operation finally
 succeeds, we then move the data from outBuffer2 into outBuffer.
 

Yes, that sounds like a good idea -- especially considering that COPY
is not the only operation that can cause SSL_write retries.

Attached is a first attempt at a patch to implement the described two
buffer approach. This modifies pqSendSome so that whenever a SSL
write retry is needed it saves the current outBuffer with its length
and attempted write size to connection's sslRetry* variables. A new
outBuffer is then allocated and used for any further data pushing.

After the SSL write retry buffer is set up, any further calls to
pqSendSome will first attempt to send the contents of the retry buffer,
returning 1 to indicate that not all of the data could be sent. If the
retry buffer is finally emptied it is freed and pqSendSome starts
sending from the regular outBuffer.

This is of course still work in progress, needs cleaning up and
definitely more testing. But at this point before going any further,
I'd really appreciate a quick review from resident libpq gurus.

regards,
Martin
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 17dde4a..b5e62c9 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -781,6 +781,8 @@ pqSendSome(PGconn *conn, int len)
 	char	   *ptr = conn-outBuffer;
 	int			remaining = conn-outCount;
 	int			result = 0;
+	int			sent = 0;
+	bool		usingSSLWriteBuffer = false;
 
 	if (conn-sock  0)
 	{
@@ -789,10 +791,28 @@ pqSendSome(PGconn *conn, int len)
 		return -1;
 	}
 
+#ifdef USE_SSL
+	if (conn-sslRetryPos)
+	{
+		/* 
+		 * We have some leftovers from previous SSL write attempts, send those
+		 * instead of the regular output buffer.
+		 */
+		len = conn-sslRetryBytes;
+		ptr = conn-sslRetryPos;
+		remaining = conn-sslRetryBufSize;
+
+		if (conn-Pfdebug)
+			fprintf(conn-Pfdebug, pqSendSome: using SSL write retry buffer: %p retry: %d total: %d\n,
+ptr, len, remaining);
+
+		usingSSLWriteBuffer = true;
+	}
+#endif
+
 	/* while there's still data to send */
 	while (len  0)
 	{
-		int			sent;
 		char		sebuf[256];
 
 #ifndef WIN32
@@ -807,6 +827,10 @@ pqSendSome(PGconn *conn, int len)
 		sent = pqsecure_write(conn, ptr, Min(len, 65536));
 #endif
 
+		if (conn-Pfdebug)
+			fprintf(conn-Pfdebug, pqSendSome: write buf: %p len: %d sent: %d\n,
+	ptr, len, sent);
+
 		if (sent  0)
 		{
 			/*
@@ -857,6 +881,35 @@ pqSendSome(PGconn *conn, int len)
 	return -1;
 			}
 		}
+#ifdef USE_SSL
+		else if (sent == 0  conn-ssl)
+		{
+			/*
+			 * With non-blocking SSL connections we need to ensure that the
+			 * buffer passed to the pqsecure_write() retry is the the exact
+			 * same buffer as in previous write -- see SSL_write(3SSL) for more
+			 * on this. For this we need to keep the the original buffer and
+			 * remember its length. Also a new outBuffer is needed, but the
+			 * actual allocation is deferred to the end of the function.
+			 *
+			 * For non-SSL connections none of this is needed.
+			 */
+			if (!conn-sslRetryPos)
+			{
+if (conn-Pfdebug)
+	fprintf(conn-Pfdebug, pqSendSome: SSL retry setup: buf: %p len: %d remain: %d\n,
+		conn-sslRetryBuf, len, remaining);
+
+conn-sslRetryBuf = conn-outBuffer;
+conn-sslRetryPos = ptr;
+conn-sslRetryBytes = len;
+conn-sslRetryBufSize = remaining;
+conn-outBuffer = NULL;
+			} 
+
+			usingSSLWriteBuffer = true;
+		}
+#endif
 		else
 		{
 			ptr += sent;
@@ -903,10 +956,73 @@ pqSendSome(PGconn *conn, int len)
 		}
 	}
 
-	/* shift the remaining contents of the buffer */
-	if (remaining  0)
-		memmove(conn-outBuffer, ptr, remaining);
-	conn-outCount = remaining;
+#ifdef USE_SSL
+	if (conn-sslRetryPos  sent  0)
+	{
+		/*
+		 * A SSL write was successfully retried, advance the retry buffer
+		 * position to the rest of the buffer. Free the buffer if all of its
+		 * contents are delivered.
+		 */
+		conn-sslRetryPos = ptr;
+		conn-sslRetryBytes = remaining;
+		conn-sslRetryBufSize = remaining;
+
+		if (conn-sslRetryBufSize = 0)
+		{
+			if (conn-Pfdebug)
+fprintf(conn-Pfdebug, SSL retry clean-up: freed buf=%p\n,
+	conn-sslRetryBuf);
+
+			free(conn-sslRetryBuf);
+			conn-sslRetryBytes = conn-sslRetryBufSize = 0;
+			conn-sslRetryBuf = conn-sslRetryPos = NULL;
+		}
+
+		if (conn-sslRetryBytes || (conn-outCount - remaining)  0)
+		{
+			/*
+			 * We still have some data left in the SSL retry buffers or the 
+			 * outBuffer. Return 1 to indicate that further retries are needed
+			 * to flush the entire output buffer.
+			 */
+			

Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-11 Thread Robert Haas
On Thu, Jun 9, 2011 at 10:39 AM, Martin Pihlak martin.pih...@gmail.com wrote:
 In PQputCopyData #2 it is visible that the first SSL_write called from
 pqFlush failed with SSL_ERROR_WANT_WRITE. The next SSL_write should
 have been a retry with the same parameters, but instead was passed a
 buffer with a different address and length. Hence the bad write
 retry. Some googling turned out similar issues for other projects
 using SSL with non-blocking sockets.

I'm not surprised.  Setting the API so that it requires the same
buffer (and not just a buffer pointing to the same data, or with the
same initial contents) was maybe not the greatest design decision.

 The possible workarounds are to disable SSL or to disable non-blocking
 libpq connections. Both are not always possible - security reasons, 3rd
 party applications, drivers, etc. So I think this should be fixed in
 libpq. Not sure exactly how though. It would seem that for the
 PQputCopyData the best would be to return 0 to indicate that the
 operation should be retried. No idea for the other possible cases of
 SSL_write() retry though. What do you think?

It seems to me we should try to paper over the problem within libpq
rather than allowing it to percolate any higher up the call stack.
One idea is that we could add outBuffer2/outBufSize2 to struct
pg_conn, or something along those lines with less obviously stupid
naming.  Normally those would be unused, but in the special case where
SSL indicates that we must retry the call with the same arguments, we
set a flag that freezes the out buffer and forces any further data
to be stuffed into outBuffer2.  If or when the operation finally
succeeds, we then move the data from outBuffer2 into outBuffer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq SSL with non-blocking sockets

2011-06-09 Thread Martin Pihlak
I believe I have found a bug in libpq COPY handling with non-blocking
SSL connections. The bug manifests itself by dropping the connection
in PGputCopyData() with server closed the connection unexpectedly
message. The connection drop only occurs with nonblocking connections -
blocking connections work as expected.

I'll skip a lot of debugging details, but what I found out was that
the connection drop was caused by a SSL bad write retry from
SSL_write(). Which in turn appears to be caused by bad data passed to
SSL_write() retry. So, what happened was that at some point SSL_write()
returned SSL_ERROR_WANT_WRITE meaning that the write needs to be
retried. Now, according to OpenSSL manual the retry SSL_write needs to
be passed the same arguments as the previous call:
http://www.openssl.org/docs/ssl/SSL_write.html#WARNING

However the next SSL_write was passed another buffer with a different
length and it failed with SSL_ERROR_SSL bad write retry. Which in
turn caused the pqSendSome to drop the connection and failed the
PQputCopyData. Actually, now that I think of it, this is probably not
only related to COPY handling ...

The connection drop can be reproduced by introducing some network
problems to the connection, so that the SSL_write needs to be retried.
This can be most easily accomplished by passing a huge buffer to
PQputCopyData. For instance on my laptop I can do about 3 PQputCopyData
with a 100MB buffer before the connection is dropped. Others results may
vary. Also, it doesn't seem to matter what the libpq version is
used - I initially started on 8.3, but later confirmed that the same
problem exists on 8.4, 9.0 and 9.1. Tested with OpenSSL 0.9.8g and
0.9.8o, no idea about other versions.

The following is a log excerpt from my test program (attached) with some
extra logging added to libpq:

PQputCopyData #0 buf:1 total:0
  pqFlush()
  pqFlush() = 0
  pqPutMsgEnd()
pqSendSome(9744)
  SSL_write(0x7e59d60, 0x7ff5a7eac010, 9744) = 9744:
SSL_error: 0, errno: 0
  pqPutMsgEnd() = 0
PQputCopyData #1 buf:1 total:1
  pqPutMsgEnd()
pqSendSome(9744)
  SSL_write(0x7e59d60, 0x7ff5a7eac010, 9744) = -1: SSL_error: 3,
errno: 11
  pqPutMsgEnd() = 0
PQputCopyData #2 buf:1 total:2
  pqFlush()
pqSendSome(10266)
  SSL_write(0x7e59d60, 0x7ff5a7eac010, 10266) = -1: SSL_error:
3, errno: 11
  pqFlush() = 1
  pqPutMsgEnd()
pqSendSome(19488)
  SSL_write(0x7e59d60, 0x7ff597eab010, 19488) = -1: SSL_error:
1, errno: 0
  SSL error: bad write retry
  pqPutMsgEnd() = -1
PQputCopyData() = -1: server closed the connection unexpectedly

In PQputCopyData #2 it is visible that the first SSL_write called from
pqFlush failed with SSL_ERROR_WANT_WRITE. The next SSL_write should
have been a retry with the same parameters, but instead was passed a
buffer with a different address and length. Hence the bad write
retry. Some googling turned out similar issues for other projects
using SSL with non-blocking sockets.

The possible workarounds are to disable SSL or to disable non-blocking
libpq connections. Both are not always possible - security reasons, 3rd
party applications, drivers, etc. So I think this should be fixed in
libpq. Not sure exactly how though. It would seem that for the
PQputCopyData the best would be to return 0 to indicate that the
operation should be retried. No idea for the other possible cases of
SSL_write() retry though. What do you think?

regards,
Martin
/* Debugging libpq SSL connection crashes. */
#include stdio.h
#include stdlib.h
#include stdarg.h

#include libpq-fe.h

#define BUFSIZE 1
#define CONNINFO host=localhost sslmode=require

static char buf[BUFSIZE];


void die(char *fmt, ...)
{
va_list ap;

va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
va_end(ap);
exit(1);
}

void exec_sql(PGconn *conn, const char *sql, int expected_status)
{
PGresult *res;

fprintf(stderr, sql: %s\n, sql);
res = PQexec(conn, sql);
if (PQresultStatus(res) != expected_status)
die(failed: %s\n, sql, PQerrorMessage(conn));
}

int main(void)
{
PGconn *conn;
PGresult *res;
unsigned long bytes = 0;
int count = 0;

conn = PQconnectdb(CONNINFO);
if (PQstatus(conn) != CONNECTION_OK)
die(connection to database failed: %s, PQerrorMessage(conn));
fprintf(stderr, Connected to: %s\n, CONNINFO);

exec_sql(conn, BEGIN, PGRES_COMMAND_OK);
exec_sql(conn, DROP TABLE IF exists test_ssl_copy, PGRES_COMMAND_OK);
exec_sql(conn, CREATE TABLE test_ssl_copy(t text), PGRES_COMMAND_OK);
exec_sql(conn, COPY test_ssl_copy(t) FROM stdin, PGRES_COPY_IN);

/* Set the connection to non-blocking to enable crashing */
if (PQsetnonblocking(conn, 1) != 0)
die(PQsetnonblocking failed: %s, PQerrorMessage(conn));

while (1) {
int rc;

fprintf(stderr, PQputCopyData #%d buf:%lu total:%lu\n, count, sizeof(buf), bytes);
rc =