On Fri, May 22, 2009 at 03:14:46PM -0700, David Schwartz wrote: > > Thor Lancelot Simon wrote: > > > 1) I have data to write, and the SSL session's descriptor > > selects as ready-to-write. > > This already scares me. You have data to write on the unencrypted stream to > the SSL connection. The SSL session's descriptor write is for the encrypted > stream between SSL implementations. Why would you check one when you want to > write to the other?
I'm sorry, what are you talking about? I have data to write on the SSL *. I'm not interested in doing two system calls for every logical write operation. If OpenSSL wants to tell me WANT_WRITE -- or WANT_READ -- after I issue an SSL_write(), it's certainly free to do that; but on what rational grounds ought I not issue an SSL_write() if I know the file descriptor underlying the SSL * is write-ready? > What you are doing makes failure scenarios possible. Consider: > > 1) You call SSL_write. Since a renegotiation is in progress, it tries to > read the data to complete the rengotiation. > > 2) It reads the renegotiation data, and some application data. It completes > the write. > > 3) You select on the socket for 'read', but you don't get a hit because the > data has already been read. My code will never select on the socket underlying the SSL * for read unless it has drained all data previously requested for that direction using SSL_read(). You seem to believe some kind of desynchronization is possible here which I am not seeing. > So why were you waiting for the SSL session's file descriptor to select as > ready-to-read? Because all data I ever actually requested via SSL_read() has been handed to me already by OpenSSL. If a renegotiation causes the socket to spuriously select as ready-for-read, the worst that can happen, as far as I can tell, is that I generate a call to SSL_read() which returns WANT_READ. What is the harm in this? > An SSL connection has *ONE* *STATE*. When SSL_write completes normally, its > state is "everything is fine". It's state is *NOT* "want read", which seems > to be what you were assuming. I think you are jumping to conclusions. An SSL connection has two directions of data flow. Many applications which run OpenSSL in non-blocking mode appear to treat the entire SSL connection as half-duplex, but there is nothing about the SSL protocol itself which mandates half-duplex operation, and I can't see anything in the OpenSSL documentation which states that full-duplex operation is required. My application has two -- interlocking, necessarily, since there is global state of the SSL session -- state machines for interacting with OpenSSL, one for read and one for write. When SSL_read() or SSL_write() return WANT_X for some X, this is an event which forces synchronization of the state machines -- since, in the case in which X is the direction expected for normal operation, e.g. WANT_READ from a call to SSL_read() I can't tell whether that was caused by a normal I/O drain, or by a renegotation which will require pausing the other direction to complete (because of the very odd "next call must be _exactly the same API function, with the same buffer and length" rule of the API which is semi-documented in various manual pages). So I go to some lengths to avoid causing this kind of synchronization by causing the API to return to me WANT_READ or WANT_WRITE. You seem to be arguing that I should never select() on the SSL session's file descriptor for ready-to-read unless SSL_read() or SSL_write() has already returned to me ready-for-read. But select() will never cause me to call (for example) SSL_read() when I would not have called it if I did *not* select() first: it can only cause me _not_ to call SSL_read() in cases where I would have called it and received WANT_READ in return, if I followed what appears to be your proposed heuristic of "always try the I/O first, then sleep on I/O if nothing is ready". So if I had this wrong, I would _stall waiting for I/O which never appeared_ -- which is not what happens. > > > 5) I call SSL_read with a 4096-byte buffer. SSL_read returns > > -1 and error is SSL_ERROR_WANT_READ. > > So now you know that a read hit from select is needed (or other forward > progress must be made) before SSL_read can succeed. Sure. > > 6) I set a flag to ensure I do not call SSL_write() (it isn't > > clear to me this is necessary -- the documentation is vague) > > and select on the SSL session's descriptor for read. > > Why would you avoid writing? It's possible the other side will not send any > application data. Because the documentation appears in several places to state that when I receive WANT_X from _any_ OpenSSL API operation in non-blocking mode, the next API operation I issue on that SSL * must be the exact same API operation, with the same parameters. We've also seen OpenSSL developers say as much on this mailing list in the past -- and in testing, I have in fact seen misbehavior with previous versions of OpenSSL (e.g. corrupt output SSL records) which the library should never be able to generate and which disappears if I impose this rule on my application's use of the API. Fortunately, because of the application-layer protocol I am using this cannot cause me to pause forever. But, in the general case, as I understand it, you're positing that: 1) I see the file descriptor select as ready-for-read because a reneg initiated by the peer was in process, and the peer sent some data which will never cause any application-layer bytes to be ready for me. 2) I arrange not to SSL_write, temporarily, by setting a flag. 2) I call SSL_read(), which consumes the bytes the peer sent, and then, since it has no app data to give me, returns -1 WANT_READ. 3) I sleep on the descriptor for read. I won't do any other API operation as I've excluded myself from writing. If the peer will never generate any data towards me of its own accord (if the application-layer protocol lets it just go silent unless I send it something) you conclude that I will sleep forever. I think in general your analysis of this possibility is correct, and that I should better investigate why I believe I need to avoid calling SSL_write in this case and how to fix it if it is really so inside OpenSSL. But I also think that can't be the underlying issue causing the symptom I am actually seeing below. > > 7) The SSL session's file descriptor selects as ready for read, > > I call SSL with the same 4096 byte buffer at the same address, > > and SSL_read returns -1 and error is SSL_ERROR_SSL. > > > > I cannot understand why #7 occurs. Is SSL_MODE_ENABLE_PARTIAL_WRITE just > > incompatible with non-blocking mode and renegotiations? > > Perhaps the other side gave up waiting for you and closed the connection > down abruptly? What does the other side show when this happens? This happens in a matter of milliseconds; the other side shows either an SSL protocol error (though I haven't caught a bad SSL record coming out of OpenSSL towards the peer _yet_) or a close of the underlying TCP connection (FIN or RST sent towards the other side from my side). It seems to me that no matter what else I may have wrong, *that* should not be the result of my control flow. Let's suppose you're right and I can hang forever waiting for application data from the peer which never arrives, because of when I chose to select(). Nonetheless, it should not be the case that in a matter of milliseconds after #7, OpenSSL slams the door on the peer. And, curiously, if I don't use PARTIAL_WRITE, it doesn't do so... Thor ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org