On Fri, May 22, 2009 at 03:14:46PM -0700, David Schwartz wrote:
> 
> Thor Lancelot Simon wrote:
> 
> >     1) I have data to write, and the SSL session's descriptor
> >        selects as ready-to-write.
> 
> This already scares me. You have data to write on the unencrypted stream to
> the SSL connection. The SSL session's descriptor write is for the encrypted
> stream between SSL implementations. Why would you check one when you want to
> write to the other?

I'm sorry, what are you talking about?  I have data to write on the SSL *.

I'm not interested in doing two system calls for every logical write
operation.  If OpenSSL wants to tell me WANT_WRITE -- or WANT_READ --
after I issue an SSL_write(), it's certainly free to do that; but on
what rational grounds ought I not issue an SSL_write() if I know the
file descriptor underlying the SSL * is write-ready?

> What you are doing makes failure scenarios possible. Consider:
> 
> 1) You call SSL_write. Since a renegotiation is in progress, it tries to
> read the data to complete the rengotiation.
> 
> 2) It reads the renegotiation data, and some application data. It completes
> the write.
> 
> 3) You select on the socket for 'read', but you don't get a hit because the
> data has already been read.

My code will never select on the socket underlying the SSL * for read unless
it has drained all data previously requested for that direction using
SSL_read().  You seem to believe some kind of desynchronization is possible
here which I am not seeing.

> So why were you waiting for the SSL session's file descriptor to select as
> ready-to-read?

Because all data I ever actually requested via SSL_read() has been handed
to me already by OpenSSL.  If a renegotiation causes the socket to
spuriously select as ready-for-read, the worst that can happen, as far as
I can tell, is that I generate a call to SSL_read() which returns
WANT_READ.  What is the harm in this?

> An SSL connection has *ONE* *STATE*. When SSL_write completes normally, its
> state is "everything is fine". It's state is *NOT* "want read", which seems
> to be what you were assuming.

I think you are jumping to conclusions.  An SSL connection has two
directions of data flow.  Many applications which run OpenSSL in
non-blocking mode appear to treat the entire SSL connection as half-duplex,
but there is nothing about the SSL protocol itself which mandates
half-duplex operation, and I can't see anything in the OpenSSL
documentation which states that full-duplex operation is required.

My application has two -- interlocking, necessarily, since there is global
state of the SSL session -- state machines for interacting with OpenSSL,
one for read and one for write.  When SSL_read() or SSL_write() return
WANT_X for some X, this is an event which forces synchronization of the
state machines -- since, in the case in which X is the direction expected
for normal operation, e.g. WANT_READ from a call to SSL_read() I can't tell
whether that was caused by a normal I/O drain, or by a renegotation which
will require pausing the other direction to complete (because of the very
odd "next call must be _exactly the same API function, with the same
buffer and length" rule of the API which is semi-documented in various
manual pages).  So I go to some lengths to avoid causing this kind of
synchronization by causing the API to return to me WANT_READ or WANT_WRITE.

You seem to be arguing that I should never select() on the SSL session's
file descriptor for ready-to-read unless SSL_read() or SSL_write() has
already returned to me ready-for-read.  But select() will never cause me
to call (for example) SSL_read() when I would not have called it if I
did *not* select() first: it can only cause me _not_ to call SSL_read()
in cases where I would have called it and received WANT_READ in return,
if I followed what appears to be your proposed heuristic of "always
try the I/O first, then sleep on I/O if nothing is ready".  So if I had
this wrong, I would _stall waiting for I/O which never appeared_ -- which
is not what happens.

> 
> >     5) I call SSL_read with a 4096-byte buffer.  SSL_read returns
> >        -1 and error is SSL_ERROR_WANT_READ.
> 
> So now you know that a read hit from select is needed (or other forward
> progress must be made) before SSL_read can succeed.

Sure.

> >     6) I set a flag to ensure I do not call SSL_write() (it isn't
> >        clear to me this is necessary -- the documentation is vague)
> >        and select on the SSL session's descriptor for read.
> 
> Why would you avoid writing? It's possible the other side will not send any
> application data.

Because the documentation appears in several places to state that when I
receive WANT_X from _any_ OpenSSL API operation in non-blocking mode, the
next API operation I issue on that SSL * must be the exact same API
operation, with the same parameters.  We've also seen OpenSSL developers
say as much on this mailing list in the past -- and in testing, I have
in fact seen misbehavior with previous versions of OpenSSL (e.g. corrupt
output SSL records) which the library should never be able to generate and
which disappears if I impose this rule on my application's use of the API.

Fortunately, because of the application-layer protocol I am using this
cannot cause me to pause forever.  But, in the general case, as I
understand it, you're positing that:

1) I see the file descriptor select as ready-for-read because a reneg
   initiated by the peer was in process, and the peer sent some data
   which will never cause any application-layer bytes to be ready for me.

2) I arrange not to SSL_write, temporarily, by setting a flag.

2) I call SSL_read(), which consumes the bytes the peer sent, and then,
   since it has no app data to give me, returns -1 WANT_READ.

3) I sleep on the descriptor for read.  I won't do any other API
   operation as I've excluded myself from writing.  If the peer will
   never generate any data towards me of its own accord (if the
   application-layer protocol lets it just go silent unless I send it
   something) you conclude that I will sleep forever.

I think in general your analysis of this possibility is correct, and
that I should better investigate why I believe I need to avoid calling
SSL_write in this case and how to fix it if it is really so inside
OpenSSL.

But I also think that can't be the underlying issue causing the symptom
I am actually seeing below.

> >     7) The SSL session's file descriptor selects as ready for read,
> >        I call SSL with the same 4096 byte buffer at the same address,
> >        and SSL_read returns -1 and error is SSL_ERROR_SSL.
> >
> > I cannot understand why #7 occurs.  Is SSL_MODE_ENABLE_PARTIAL_WRITE just
> > incompatible with non-blocking mode and renegotiations?
> 
> Perhaps the other side gave up waiting for you and closed the connection
> down abruptly? What does the other side show when this happens?

This happens in a matter of milliseconds; the other side shows either
an SSL protocol error (though I haven't caught a bad SSL record coming
out of OpenSSL towards the peer _yet_) or a close of the underlying
TCP connection (FIN or RST sent towards the other side from my side).

It seems to me that no matter what else I may have wrong, *that* should
not be the result of my control flow.  Let's suppose you're right and
I can hang forever waiting for application data from the peer which
never arrives, because of when I chose to select().  Nonetheless, it
should not be the case that in a matter of milliseconds after #7,
OpenSSL slams the door on the peer.

And, curiously, if I don't use PARTIAL_WRITE, it doesn't do so...

Thor
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to