Re: [HACKERS] libpq SSL with non-blocking sockets
On 07/24/2011 11:33 PM, Tom Lane wrote: I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER) as well as a patch to improve the error reporting situation. Cool that this turned out to be a one-line fix. Thanks! regards, Martin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 07/16/2011 12:46 AM, Tom Lane wrote: I think the direction to move in ought to be to use the existing buffer as-is, and have pqCheckOutBufferSpace refuse to enlarge the buffer while we are in write frozen state. It should be OK to append data to the buffer, though, so long as we remember how much we're allowed to pass to SSL_write when we next try to write. Alternative to freezing the outBuffer would be to set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode during SSL initialisation. That would enable the buffer address to be changed in-between the SSL_write calls, so long as the content remains the same. Attached is a patch that uses the single buffer approach described by Tom, but with a moving SSL write buffer enabled. Modifying pqCheckOutBufferSpace is also an option, but it'd break some (arguably already broken) client applications that don't have proper retry handling. Notably some versions of psycopg2 have problems with handling zero return values from PQputCopyData. So ISTM that from backporting perspective the moving write buffer is a bit safer. I'll see if I can come up with a test case for the SSL_read retry before attempting to fix that. regards, Martin diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 17dde4a..977e37c 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -793,20 +793,26 @@ pqSendSome(PGconn *conn, int len) while (len 0) { int sent; + int write_size = len; char sebuf[256]; -#ifndef WIN32 - sent = pqsecure_write(conn, ptr, len); -#else - +#ifdef WIN32 /* * Windows can fail on large sends, per KB article Q201213. The * failure-point appears to be different in different versions of * Windows, but 64k should always be safe. */ - sent = pqsecure_write(conn, ptr, Min(len, 65536)); + write_size = Min(len, 65536); +#endif + +#ifdef USE_SSL + /* We have leftovers from previous SSL write, send these first. */ + if (conn-ssl_retry_bytes) + write_size = conn-ssl_retry_bytes; #endif + sent = pqsecure_write(conn, ptr, write_size); + if (sent 0) { /* @@ -857,11 +863,30 @@ pqSendSome(PGconn *conn, int len) return -1; } } +#ifdef USE_SSL + else if (sent == 0) + { + /* + * We are in non-blocking mode, set up for retrying a SSL write + * operation. OpenSSL requires the same buffer to be passed to the + * write retry, so we must remember the write size. However, we + * don't need to keep the original buffer address as the + * SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode ought to be set. + */ + if (!conn-ssl_retry_bytes) +conn-ssl_retry_bytes = write_size; + } +#endif else { ptr += sent; len -= sent; remaining -= sent; +#ifdef USE_SSL + /* Forget the retry buffer size if all the data has been sent */ + if (conn-ssl_retry_bytes) +conn-ssl_retry_bytes = 0; +#endif } if (len 0) diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c index cd1292c..540561d 100644 --- a/src/interfaces/libpq/fe-secure.c +++ b/src/interfaces/libpq/fe-secure.c @@ -757,6 +757,9 @@ init_ssl_system(PGconn *conn) #endif return -1; } + + /* Enable moving write buffer to simplify write retry handling. */ + SSL_CTX_set_mode(SSL_context, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER); } #ifdef ENABLE_THREAD_SAFETY diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index d56ef5d..ae212a6 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -407,6 +407,8 @@ struct pg_conn X509 *peer; /* X509 cert of server */ char peer_dn[256 + 1]; /* peer distinguished name */ char peer_cn[SM_USER + 1]; /* peer common name */ + + int ssl_retry_bytes; /* buffer size for SSL write retry */ #ifdef USE_SSL_ENGINE ENGINE *engine; /* SSL engine, if any */ #else -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
Martin Pihlak martin.pih...@gmail.com writes: On 07/16/2011 12:46 AM, Tom Lane wrote: I think the direction to move in ought to be to use the existing buffer as-is, and have pqCheckOutBufferSpace refuse to enlarge the buffer while we are in write frozen state. It should be OK to append data to the buffer, though, so long as we remember how much we're allowed to pass to SSL_write when we next try to write. Alternative to freezing the outBuffer would be to set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode during SSL initialisation. Hmmm ... that is a really interesting flag. The documentation is inadequate, but I googled a bit and found this thread about it: http://www.mail-archive.com/openssl-users@openssl.org/msg45440.html in which it's stated that (1) the prohibition on moving the buffer is actually just a sort of sanity check, and can be turned off using this flag; (2) it is okay to increase, but not reduce, the length parameter to SSL_write in the next call after a WANT_READ/WANT_WRITE return. So this does look like it'd fix the issue for SSL_write, without needing to introduce a concept of a write frozen buffer state. I am wondering though how far back support for this flag exists in OpenSSL, and whether the situation is the same for SSL_read or not. I don't see any SSL_MODE_ACCEPT_MOVING_READ_BUFFER macro ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
I wrote: So this does look like it'd fix the issue for SSL_write, without needing to introduce a concept of a write frozen buffer state. I am wondering though how far back support for this flag exists in OpenSSL, A bit of archaeology reveals that the flag was introduced in OpenSSL 0.9.4, released in 1999. So it's probably Old Enough. (BTW, the 0.9.4 changelog credits this change to one Bodo Moeller ... so the comments from him in the other thread I linked to can be considered authoritative ...) Still wondering about the SSL_read end of it, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
I wrote: Still wondering about the SSL_read end of it, though. And on that front, some digging around in the OpenSSL source code indicates that they do all their work in internal buffers, and transfer data into SSL_read's result buffer only when ready to return it. So the claim in the documentation that SSL_read has a restriction comparable to SSL_write is a lie: there is no case where they'll copy some data into the buffer and then return -1. So the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER solution looks like a good fix. I'll see about applying it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
Martin Pihlak martin.pih...@gmail.com writes: If possible I would like the fix to be backported as well. This is quite a nasty bug and difficult to track down. Especially as the actual SSL error messages are masked by the server closed the connection unexpectedly message. I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER) as well as a patch to improve the error reporting situation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 24 July 2011 21:33, Tom Lane t...@sss.pgh.pa.us wrote: I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER) as well as a patch to improve the error reporting situation. I'm not exactly sure why, and don't have time to investigate right now, but this commit (fee476da952a1f02f7ccf6e233fb4824c2bf6af4) appears to have broken the build for me: gcc -O0 -g -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -g -pthread -D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS -fpic -DFRONTEND -DUNSAFE_STAT_OK -I. -I../../../src/include -D_GNU_SOURCE -I../../../src/port -I../../../src/port -DSO_MAJOR_VERSION=5 -c -o fe-misc.o fe-misc.c -MMD -MP -MF .deps/fe-misc.Po fe-misc.c: In function ‘pqReadData’: fe-misc.c:651:11: error: ‘PGconn’ has no member named ‘ssl’ fe-misc.c:743:11: error: ‘PGconn’ has no member named ‘ssl’ fe-misc.c:761:10: error: ‘PGconn’ has no member named ‘ssl’ fe-misc.c: In function ‘pqSendSome’: fe-misc.c:841:14: error: ‘PGconn’ has no member named ‘ssl’ fe-misc.c:861:14: error: ‘PGconn’ has no member named ‘ssl’ The problem goes away if I check out the commit made immediately prior to this one - d0c23026b2499ba9d6797359241ade076a5a677d. I'm building with my usual, rather generic settings for hacking. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 07/24/2011 07:49 PM, Peter Geoghegan wrote: On 24 July 2011 21:33, Tom Lanet...@sss.pgh.pa.us wrote: I've applied the simplified fix (just set SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER) as well as a patch to improve the error reporting situation. I'm not exactly sure why, and don't have time to investigate right now, but this commit (fee476da952a1f02f7ccf6e233fb4824c2bf6af4) appears to have broken the build for me: Yeah, looks like it fails unless --with-openssl is set. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
Robert Haas robertmh...@gmail.com writes: On Fri, Jul 8, 2011 at 9:36 AM, Martin Pihlak martin.pih...@gmail.com wrote: On 07/03/2011 05:08 AM, Steve Singer wrote: Since the original patch was submitted as a WIP patch and this version wasn't sent until well into the commit fest I am not sure if it qualifies for a committer during this commitfest or if it needs to wait until the next one. If possible I would like the fix to be backported as well. This is quite a nasty bug and difficult to track down. Especially as the actual SSL error messages are masked by the server closed the connection unexpectedly message. I would not be inclined to back-patch this straight away. I agree it's an important fix, but I am a little worried about the chances of breaking something else... then again, I don't have the only vote here. I reviewed this patch a bit. I agree that we have a bug here that should be fixed and backported, but I don't think this patch is ready. Some problems: 1. The behavior under low-memory conditions is pretty intolerable. As coded, a malloc failure here: +conn-outBufSize = Max(16 * 1024, remaining); +conn-outBuffer = malloc(conn-outBufSize); +if (conn-outBuffer == NULL) +{ +printfPQExpBuffer(conn-errorMessage, + cannot allocate memory for output buffer\n); +return -1; +} leaves libpq's internal state completely corrupt --- outBuffer is null, which will result in instant coredump on any future access, but that's just the tip of the iceberg because we've also lost buffered data and messed up the none-too-clearly-defined-anyway state of which pending data is in which buffer. In general, I don't think it's a smart idea to proceed by duplicating the buffer contents in this situation, particularly not if the most obvious way to cause the problem is to have a very large buffer :-(. I think the direction to move in ought to be to use the existing buffer as-is, and have pqCheckOutBufferSpace refuse to enlarge the buffer while we are in write frozen state. It should be OK to append data to the buffer, though, so long as we remember how much we're allowed to pass to SSL_write when we next try to write. 2. According to the OpenSSL man pages, *both* SSL_read and SSL_write are defined to need to be re-called with the identical arguments after a WANT_READ or WANT_WRITE return. This means that pqReadData() is also at risk for this type of bug. I believe it could only happen in code paths where we call pqReadData when there is already some data in the buffer: if the caller then consumes some of that data, the next call to pqReadData would try to compact the buffer contents before making the pqsecure_read call. I think we probably need a read frozen state in which we won't enlarge or compact the inBuffer until SSL_read succeeds. 3. As of 9.0, somebody's decided that the backend needs nonblock read semantics in some cases. I probably should have complained harder about that before it went in, but since it's in, the backend is at risk for this same type of issue in secure_read(). I will mark the patch Returned With Feedback. I think that we need to address all these issues before we consider applying it. If we weren't hard up against the end of the CommitFest I might have a go at it myself, but I can't justify the time right now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 07/03/2011 05:08 AM, Steve Singer wrote: Since the original patch was submitted as a WIP patch and this version wasn't sent until well into the commit fest I am not sure if it qualifies for a committer during this commitfest or if it needs to wait until the next one. If possible I would like the fix to be backported as well. This is quite a nasty bug and difficult to track down. Especially as the actual SSL error messages are masked by the server closed the connection unexpectedly message. regards, Martin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On Fri, Jul 8, 2011 at 9:36 AM, Martin Pihlak martin.pih...@gmail.com wrote: On 07/03/2011 05:08 AM, Steve Singer wrote: Since the original patch was submitted as a WIP patch and this version wasn't sent until well into the commit fest I am not sure if it qualifies for a committer during this commitfest or if it needs to wait until the next one. If possible I would like the fix to be backported as well. This is quite a nasty bug and difficult to track down. Especially as the actual SSL error messages are masked by the server closed the connection unexpectedly message. I would not be inclined to back-patch this straight away. I agree it's an important fix, but I am a little worried about the chances of breaking something else... then again, I don't have the only vote here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 11-06-28 02:14 PM, Martin Pihlak wrote: Thanks for the review! I have since simplified the patch to assume that partial SSL writes are disabled -- according to SSL_write(3) this is the default behaviour. Now the SSL retry buffer only holds the data to be retried, the remainder is moved to the new outBuffer. That sounds okay. Does it make sense to add in a check to verify that SSL didn't send a partial write? I don't know how bad openssl is about changing default behaviours or if we are concerned about protecting against someone changing the SSL parameters. My inclination is that this isn't needed but I'll raise the issue. Fixed. New version of the patch attached. Otherwise this version of the patch looks good to me. The only testing I have done is running the test program you sent earlier on in the thread and verified that the regression tests all pass. Other than something like your test program I'm not sure how else this bug can be induced. Since the original patch was submitted as a WIP patch and this version wasn't sent until well into the commit fest I am not sure if it qualifies for a committer during this commitfest or if it needs to wait until the next one. regards, Martin
Re: [HACKERS] libpq SSL with non-blocking sockets
On 11-06-28 02:14 PM, Martin Pihlak wrote: Hmm, I thought I thought about that. There was a check in the original patch: if (conn-sslRetryBytes || (conn-outCount - remaining) 0) So if the SSL retry buffer was emptied it would return 1 if there was something left in the regular output buffer. Or did I miss something there? The issue I saw in the original patch was that at that point in the function, sslRetryBytes could be zero (if the data was sent) but conn-outCount - remaining would be an incorrect value since remaining is the number of bytes left to send from sslRetryBuf NOT conn-outBuffer. This is no longer an issue in the updated patch. I will try to take a closer look at your updated patch in the next few days. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 06/25/2011 12:14 AM, Steve Singer wrote: I'm not a libpq guru but I've given your patch a look over. Thanks for the review! I have since simplified the patch to assume that partial SSL writes are disabled -- according to SSL_write(3) this is the default behaviour. Now the SSL retry buffer only holds the data to be retried, the remainder is moved to the new outBuffer. -The patch doesn't compile with non-ssl builds, the debug at the bottom of PQSendSome isn't in an #ifdef Ack, there was another one in pqFlush. Fixed that. -I don't think your handling the return code properly. Consider this case. pqSendSome(some data) sslRetryBuf = some Data return 1 pqSendSome(more data) it sends all of 'some data' returns 0 I think 1 should be returned because all of 'more data' still needs to be sent. I think returning a 0 will break PQsetnonblocking if you call it when there is data in both sslRetryBuf and outputBuffer. Hmm, I thought I thought about that. There was a check in the original patch: if (conn-sslRetryBytes || (conn-outCount - remaining) 0) So if the SSL retry buffer was emptied it would return 1 if there was something left in the regular output buffer. Or did I miss something there? We might even want to try sending the data in outputBuffer after we've sent all the data sitting in sslRetryBuf. IMHO that'd add unnecessary complexity to the pqSendSome. As this only happens in non-blocking mode the caller should be well prepared to handle the retry. If you close the connection with an outstanding sslRetryBuf you need to free it. Fixed. New version of the patch attached. regards, Martin diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 9e4807e..8dc0226 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2792,6 +2792,10 @@ freePGconn(PGconn *conn) if (conn-gsslib) free(conn-gsslib); #endif +#if defined(USE_SSL) + if (conn-sslOutBufPtr) + free(conn-sslOutBufPtr); +#endif /* Note that conn-Pfdebug is not ours to close or free */ if (conn-last_query) free(conn-last_query); diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 17dde4a..d0c7812 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -781,6 +781,8 @@ pqSendSome(PGconn *conn, int len) char *ptr = conn-outBuffer; int remaining = conn-outCount; int result = 0; + int sent = 0; + bool shiftOutBuffer = true; if (conn-sock 0) { @@ -789,23 +791,34 @@ pqSendSome(PGconn *conn, int len) return -1; } +#ifdef USE_SSL + if (conn-sslRetryPos) + { + /* + * We have some leftovers from previous SSL write attempts, these need + * to be sent before the regular output buffer. + */ + len = remaining = conn-sslRetrySize; + ptr = conn-sslRetryPos; + shiftOutBuffer = false; + } +#endif + /* while there's still data to send */ while (len 0) { - int sent; char sebuf[256]; + int write_size = len; -#ifndef WIN32 - sent = pqsecure_write(conn, ptr, len); -#else - +#ifdef WIN32 /* * Windows can fail on large sends, per KB article Q201213. The * failure-point appears to be different in different versions of * Windows, but 64k should always be safe. */ - sent = pqsecure_write(conn, ptr, Min(len, 65536)); + write_size = Min(len, 65536); #endif + sent = pqsecure_write(conn, ptr, write_size); if (sent 0) { @@ -857,6 +870,38 @@ pqSendSome(PGconn *conn, int len) return -1; } } +#ifdef USE_SSL + else if (sent == 0 conn-ssl !conn-sslRetryPos) + { + /* + * pqsecure_write indicates that a SSL write needs to be retried. + * With non-blocking connections we need to ensure that the buffer + * passed to the SSL_write() retry is the the exact same buffer as + * in previous write -- see SSL_write(3SSL) for more on this. For + * this we need to save the the original buffer and allocate a new + * buffer for outgoing data. + */ + conn-sslOutBufPtr = conn-outBuffer; + conn-sslRetryPos = ptr; + conn-sslRetrySize = write_size; + + /* + * Allocate a new output buffer and prepare to move the remainder + * of the original outBuffer there. + */ + remaining -= write_size; + ptr += write_size; + + conn-outBufSize = Max(16 * 1024, remaining); + conn-outBuffer = malloc(conn-outBufSize); + if (conn-outBuffer == NULL) + { +printfPQExpBuffer(conn-errorMessage, + cannot allocate memory for output buffer\n); +return -1; + } + } +#endif else { ptr += sent; @@ -903,10 +948,36 @@ pqSendSome(PGconn *conn, int len) } } - /* shift the remaining contents of the buffer */ - if (remaining 0) - memmove(conn-outBuffer, ptr, remaining); - conn-outCount = remaining; +#ifdef USE_SSL + if (conn-sslRetryPos sent 0) + { + /* + * A SSL write was successfully retried, free the retry buffer and + * switch to the regular outBuffer.
Re: [HACKERS] libpq SSL with non-blocking sockets
On Fri, Jun 24, 2011 at 5:14 PM, Steve Singer ssinger...@sympatico.ca wrote: A few things I noticed (that you might be aware of since you mentioned it needs cleanup) -The patch doesn't compile with non-ssl builds, the debug at the bottom of PQSendSome isn't in an #ifdef -I don't think your handling the return code properly. Consider this case. pqSendSome(some data) sslRetryBuf = some Data return 1 pqSendSome(more data) it sends all of 'some data' returns 0 I think 1 should be returned because all of 'more data' still needs to be sent. I think returning a 0 will break PQsetnonblocking if you call it when there is data in both sslRetryBuf and outputBuffer. We might even want to try sending the data in outputBuffer after we've sent all the data sitting in sslRetryBuf. If you close the connection with an outstanding sslRetryBuf you need to free it. Based on these comments, I have updated the status of the patch to Waiting on Author. https://commitfest.postgresql.org/action/patch_view?id=594 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 11-06-15 03:20 PM, Martin Pihlak wrote: Yes, that sounds like a good idea -- especially considering that COPY is not the only operation that can cause SSL_write retries. This is of course still work in progress, needs cleaning up and definitely more testing. But at this point before going any further, I'd really appreciate a quick review from resident libpq gurus. Martin, I'm not a libpq guru but I've given your patch a look over. The idea of storing the original buffer in new members of connection structure for later re-use seems like a good way to approach this. A few things I noticed (that you might be aware of since you mentioned it needs cleanup) -The patch doesn't compile with non-ssl builds, the debug at the bottom of PQSendSome isn't in an #ifdef -I don't think your handling the return code properly. Consider this case. pqSendSome(some data) sslRetryBuf = some Data return 1 pqSendSome(more data) it sends all of 'some data' returns 0 I think 1 should be returned because all of 'more data' still needs to be sent. I think returning a 0 will break PQsetnonblocking if you call it when there is data in both sslRetryBuf and outputBuffer. We might even want to try sending the data in outputBuffer after we've sent all the data sitting in sslRetryBuf. If you close the connection with an outstanding sslRetryBuf you need to free it. Other than running your test program I didn't do any testing with this patch. Steve regards, Martin
Re: [HACKERS] libpq SSL with non-blocking sockets
On 06/12/2011 04:22 AM, Robert Haas wrote: One idea is that we could add outBuffer2/outBufSize2 to struct pg_conn, or something along those lines with less obviously stupid naming. Normally those would be unused, but in the special case where SSL indicates that we must retry the call with the same arguments, we set a flag that freezes the out buffer and forces any further data to be stuffed into outBuffer2. If or when the operation finally succeeds, we then move the data from outBuffer2 into outBuffer. Yes, that sounds like a good idea -- especially considering that COPY is not the only operation that can cause SSL_write retries. Attached is a first attempt at a patch to implement the described two buffer approach. This modifies pqSendSome so that whenever a SSL write retry is needed it saves the current outBuffer with its length and attempted write size to connection's sslRetry* variables. A new outBuffer is then allocated and used for any further data pushing. After the SSL write retry buffer is set up, any further calls to pqSendSome will first attempt to send the contents of the retry buffer, returning 1 to indicate that not all of the data could be sent. If the retry buffer is finally emptied it is freed and pqSendSome starts sending from the regular outBuffer. This is of course still work in progress, needs cleaning up and definitely more testing. But at this point before going any further, I'd really appreciate a quick review from resident libpq gurus. regards, Martin diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 17dde4a..b5e62c9 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -781,6 +781,8 @@ pqSendSome(PGconn *conn, int len) char *ptr = conn-outBuffer; int remaining = conn-outCount; int result = 0; + int sent = 0; + bool usingSSLWriteBuffer = false; if (conn-sock 0) { @@ -789,10 +791,28 @@ pqSendSome(PGconn *conn, int len) return -1; } +#ifdef USE_SSL + if (conn-sslRetryPos) + { + /* + * We have some leftovers from previous SSL write attempts, send those + * instead of the regular output buffer. + */ + len = conn-sslRetryBytes; + ptr = conn-sslRetryPos; + remaining = conn-sslRetryBufSize; + + if (conn-Pfdebug) + fprintf(conn-Pfdebug, pqSendSome: using SSL write retry buffer: %p retry: %d total: %d\n, +ptr, len, remaining); + + usingSSLWriteBuffer = true; + } +#endif + /* while there's still data to send */ while (len 0) { - int sent; char sebuf[256]; #ifndef WIN32 @@ -807,6 +827,10 @@ pqSendSome(PGconn *conn, int len) sent = pqsecure_write(conn, ptr, Min(len, 65536)); #endif + if (conn-Pfdebug) + fprintf(conn-Pfdebug, pqSendSome: write buf: %p len: %d sent: %d\n, + ptr, len, sent); + if (sent 0) { /* @@ -857,6 +881,35 @@ pqSendSome(PGconn *conn, int len) return -1; } } +#ifdef USE_SSL + else if (sent == 0 conn-ssl) + { + /* + * With non-blocking SSL connections we need to ensure that the + * buffer passed to the pqsecure_write() retry is the the exact + * same buffer as in previous write -- see SSL_write(3SSL) for more + * on this. For this we need to keep the the original buffer and + * remember its length. Also a new outBuffer is needed, but the + * actual allocation is deferred to the end of the function. + * + * For non-SSL connections none of this is needed. + */ + if (!conn-sslRetryPos) + { +if (conn-Pfdebug) + fprintf(conn-Pfdebug, pqSendSome: SSL retry setup: buf: %p len: %d remain: %d\n, + conn-sslRetryBuf, len, remaining); + +conn-sslRetryBuf = conn-outBuffer; +conn-sslRetryPos = ptr; +conn-sslRetryBytes = len; +conn-sslRetryBufSize = remaining; +conn-outBuffer = NULL; + } + + usingSSLWriteBuffer = true; + } +#endif else { ptr += sent; @@ -903,10 +956,73 @@ pqSendSome(PGconn *conn, int len) } } - /* shift the remaining contents of the buffer */ - if (remaining 0) - memmove(conn-outBuffer, ptr, remaining); - conn-outCount = remaining; +#ifdef USE_SSL + if (conn-sslRetryPos sent 0) + { + /* + * A SSL write was successfully retried, advance the retry buffer + * position to the rest of the buffer. Free the buffer if all of its + * contents are delivered. + */ + conn-sslRetryPos = ptr; + conn-sslRetryBytes = remaining; + conn-sslRetryBufSize = remaining; + + if (conn-sslRetryBufSize = 0) + { + if (conn-Pfdebug) +fprintf(conn-Pfdebug, SSL retry clean-up: freed buf=%p\n, + conn-sslRetryBuf); + + free(conn-sslRetryBuf); + conn-sslRetryBytes = conn-sslRetryBufSize = 0; + conn-sslRetryBuf = conn-sslRetryPos = NULL; + } + + if (conn-sslRetryBytes || (conn-outCount - remaining) 0) + { + /* + * We still have some data left in the SSL retry buffers or the + * outBuffer. Return 1 to indicate that further retries are needed + * to flush the entire output buffer. + */ +
Re: [HACKERS] libpq SSL with non-blocking sockets
On Thu, Jun 9, 2011 at 10:39 AM, Martin Pihlak martin.pih...@gmail.com wrote: In PQputCopyData #2 it is visible that the first SSL_write called from pqFlush failed with SSL_ERROR_WANT_WRITE. The next SSL_write should have been a retry with the same parameters, but instead was passed a buffer with a different address and length. Hence the bad write retry. Some googling turned out similar issues for other projects using SSL with non-blocking sockets. I'm not surprised. Setting the API so that it requires the same buffer (and not just a buffer pointing to the same data, or with the same initial contents) was maybe not the greatest design decision. The possible workarounds are to disable SSL or to disable non-blocking libpq connections. Both are not always possible - security reasons, 3rd party applications, drivers, etc. So I think this should be fixed in libpq. Not sure exactly how though. It would seem that for the PQputCopyData the best would be to return 0 to indicate that the operation should be retried. No idea for the other possible cases of SSL_write() retry though. What do you think? It seems to me we should try to paper over the problem within libpq rather than allowing it to percolate any higher up the call stack. One idea is that we could add outBuffer2/outBufSize2 to struct pg_conn, or something along those lines with less obviously stupid naming. Normally those would be unused, but in the special case where SSL indicates that we must retry the call with the same arguments, we set a flag that freezes the out buffer and forces any further data to be stuffed into outBuffer2. If or when the operation finally succeeds, we then move the data from outBuffer2 into outBuffer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq SSL with non-blocking sockets
I believe I have found a bug in libpq COPY handling with non-blocking SSL connections. The bug manifests itself by dropping the connection in PGputCopyData() with server closed the connection unexpectedly message. The connection drop only occurs with nonblocking connections - blocking connections work as expected. I'll skip a lot of debugging details, but what I found out was that the connection drop was caused by a SSL bad write retry from SSL_write(). Which in turn appears to be caused by bad data passed to SSL_write() retry. So, what happened was that at some point SSL_write() returned SSL_ERROR_WANT_WRITE meaning that the write needs to be retried. Now, according to OpenSSL manual the retry SSL_write needs to be passed the same arguments as the previous call: http://www.openssl.org/docs/ssl/SSL_write.html#WARNING However the next SSL_write was passed another buffer with a different length and it failed with SSL_ERROR_SSL bad write retry. Which in turn caused the pqSendSome to drop the connection and failed the PQputCopyData. Actually, now that I think of it, this is probably not only related to COPY handling ... The connection drop can be reproduced by introducing some network problems to the connection, so that the SSL_write needs to be retried. This can be most easily accomplished by passing a huge buffer to PQputCopyData. For instance on my laptop I can do about 3 PQputCopyData with a 100MB buffer before the connection is dropped. Others results may vary. Also, it doesn't seem to matter what the libpq version is used - I initially started on 8.3, but later confirmed that the same problem exists on 8.4, 9.0 and 9.1. Tested with OpenSSL 0.9.8g and 0.9.8o, no idea about other versions. The following is a log excerpt from my test program (attached) with some extra logging added to libpq: PQputCopyData #0 buf:1 total:0 pqFlush() pqFlush() = 0 pqPutMsgEnd() pqSendSome(9744) SSL_write(0x7e59d60, 0x7ff5a7eac010, 9744) = 9744: SSL_error: 0, errno: 0 pqPutMsgEnd() = 0 PQputCopyData #1 buf:1 total:1 pqPutMsgEnd() pqSendSome(9744) SSL_write(0x7e59d60, 0x7ff5a7eac010, 9744) = -1: SSL_error: 3, errno: 11 pqPutMsgEnd() = 0 PQputCopyData #2 buf:1 total:2 pqFlush() pqSendSome(10266) SSL_write(0x7e59d60, 0x7ff5a7eac010, 10266) = -1: SSL_error: 3, errno: 11 pqFlush() = 1 pqPutMsgEnd() pqSendSome(19488) SSL_write(0x7e59d60, 0x7ff597eab010, 19488) = -1: SSL_error: 1, errno: 0 SSL error: bad write retry pqPutMsgEnd() = -1 PQputCopyData() = -1: server closed the connection unexpectedly In PQputCopyData #2 it is visible that the first SSL_write called from pqFlush failed with SSL_ERROR_WANT_WRITE. The next SSL_write should have been a retry with the same parameters, but instead was passed a buffer with a different address and length. Hence the bad write retry. Some googling turned out similar issues for other projects using SSL with non-blocking sockets. The possible workarounds are to disable SSL or to disable non-blocking libpq connections. Both are not always possible - security reasons, 3rd party applications, drivers, etc. So I think this should be fixed in libpq. Not sure exactly how though. It would seem that for the PQputCopyData the best would be to return 0 to indicate that the operation should be retried. No idea for the other possible cases of SSL_write() retry though. What do you think? regards, Martin /* Debugging libpq SSL connection crashes. */ #include stdio.h #include stdlib.h #include stdarg.h #include libpq-fe.h #define BUFSIZE 1 #define CONNINFO host=localhost sslmode=require static char buf[BUFSIZE]; void die(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } void exec_sql(PGconn *conn, const char *sql, int expected_status) { PGresult *res; fprintf(stderr, sql: %s\n, sql); res = PQexec(conn, sql); if (PQresultStatus(res) != expected_status) die(failed: %s\n, sql, PQerrorMessage(conn)); } int main(void) { PGconn *conn; PGresult *res; unsigned long bytes = 0; int count = 0; conn = PQconnectdb(CONNINFO); if (PQstatus(conn) != CONNECTION_OK) die(connection to database failed: %s, PQerrorMessage(conn)); fprintf(stderr, Connected to: %s\n, CONNINFO); exec_sql(conn, BEGIN, PGRES_COMMAND_OK); exec_sql(conn, DROP TABLE IF exists test_ssl_copy, PGRES_COMMAND_OK); exec_sql(conn, CREATE TABLE test_ssl_copy(t text), PGRES_COMMAND_OK); exec_sql(conn, COPY test_ssl_copy(t) FROM stdin, PGRES_COPY_IN); /* Set the connection to non-blocking to enable crashing */ if (PQsetnonblocking(conn, 1) != 0) die(PQsetnonblocking failed: %s, PQerrorMessage(conn)); while (1) { int rc; fprintf(stderr, PQputCopyData #%d buf:%lu total:%lu\n, count, sizeof(buf), bytes); rc =