RE: SSL_renegotiate broken in non-blocking mode with PARTIAL_WRITE?

David Schwartz Sat, 23 May 2009 05:31:46 -0700

> On Fri, May 22, 2009 at 03:14:46PM -0700, David Schwartz wrote:

> > Thor Lancelot Simon wrote:


> > >   1) I have data to write, and the SSL session's descriptor
> > >      selects as ready-to-write.

> > This already scares me. You have data to write on the
> > unencrypted stream to
> > the SSL connection. The SSL session's descriptor write is for
> > the encrypted
> > stream between SSL implementations. Why would you check one
> > when you want to
> > write to the other?

> I'm sorry, what are you talking about?  I have data to write on the SSL *.

So you should be checking the writability of the SSL *, not the socket.

> I'm not interested in doing two system calls for every logical write
> operation.  If OpenSSL wants to tell me WANT_WRITE -- or WANT_READ --
> after I issue an SSL_write(), it's certainly free to do that; but on
> what rational grounds ought I not issue an SSL_write() if I know the
> file descriptor underlying the SSL * is write-ready?

Why do you know that? Why did you ask? You have no interest in writing to
the uderlying file descriptor, so why are you checking it for writability?

Suppose a renegotiate is in progress and SSL_write needs to *read* in order
to make forward progress. Waiting for writability will be waiting an awfully
long time.

> > What you are doing makes failure scenarios possible. Consider:
> >
> > 1) You call SSL_write. Since a renegotiation is in progress, it tries to
> > read the data to complete the rengotiation.

> > 2) It reads the renegotiation data, and some application data.
> > It completes
> > the write.

> > 3) You select on the socket for 'read', but you don't get a hit
> > because the
> > data has already been read.

> My code will never select on the socket underlying the SSL * for
> read unless
> it has drained all data previously requested for that direction using
> SSL_read().

SSL_read drains *unencrypted* data, not encrypted data.

> You seem to believe some kind of desynchronization
> is possible
> here which I am not seeing.

Yes, it most certainly is. You are making assumptions by looking through the
OpenSSL black box. These assumptions will work well when they are true and
will fail hideously when they are false. One of the cases where they are
false is, surprise, renegotiation, when SSL_write may need to read from the
socket and SSL_read may need to write to the socket.

> > So why were you waiting for the SSL session's file descriptor
> > to select as
> > ready-to-read?

> Because all data I ever actually requested via SSL_read() has been handed
> to me already by OpenSSL.  If a renegotiation causes the socket to
> spuriously select as ready-for-read, the worst that can happen, as far as
> I can tell, is that I generate a call to SSL_read() which returns
> WANT_READ.  What is the harm in this?

The harm is that you failed to call SSL_read. The data may have already been
read from the socket, say by a call to SSL_write (which can also read from
the socket).

> > An SSL connection has *ONE* *STATE*. When SSL_write completes
> > normally, its
> > state is "everything is fine". It's state is *NOT* "want read",
> > which seems
> > to be what you were assuming.

> I think you are jumping to conclusions.  An SSL connection has two
> directions of data flow.  Many applications which run OpenSSL in
> non-blocking mode appear to treat the entire SSL connection as
> half-duplex,
> but there is nothing about the SSL protocol itself which mandates
> half-duplex operation, and I can't see anything in the OpenSSL
> documentation which states that full-duplex operation is required.

I'm not sure what you think state has to do with duplex.

> My application has two -- interlocking, necessarily, since there is global
> state of the SSL session -- state machines for interacting with OpenSSL,
> one for read and one for write.  When SSL_read() or SSL_write() return
> WANT_X for some X, this is an event which forces synchronization of the
> state machines -- since, in the case in which X is the direction expected
> for normal operation, e.g. WANT_READ from a call to SSL_read() I
> can't tell
> whether that was caused by a normal I/O drain, or by a renegotation which
> will require pausing the other direction to complete (because of the very
> odd "next call must be _exactly the same API function, with the same
> buffer and length" rule of the API which is semi-documented in various
> manual pages).  So I go to some lengths to avoid causing this kind of
> synchronization by causing the API to return to me WANT_READ or
> WANT_WRITE.

Set SSL_ACCEPT_MOVING_WRITE_BUFFER. The only requirement then is that you
not try to "unwrite" data.

> You seem to be arguing that I should never select() on the SSL session's
> file descriptor for ready-to-read unless SSL_read() or SSL_write() has
> already returned to me ready-for-read.

Bingo! That is the only way to know that the SSL state machine cannot make
further progress unless it reads from the socket.

> But select() will never cause me
> to call (for example) SSL_read() when I would not have called it if I
> did *not* select() first: it can only cause me _not_ to call SSL_read()
> in cases where I would have called it and received WANT_READ in return,

How do you know you would have received WANT_READ in return? How do you know
the data was not already read from the socket?

> if I followed what appears to be your proposed heuristic of "always
> try the I/O first, then sleep on I/O if nothing is ready".  So if I had
> this wrong, I would _stall waiting for I/O which never appeared_ -- which
> is not what happens.

Huh? The rule is this simple: do not refuse to perform an SSL operation
until 'select' says to do so unless you know for a fact that the SSL state
machine requires that particular direction of I/O to make further progress.
Otherwise, you will deadlock on renegotiation.

> >
> > >   5) I call SSL_read with a 4096-byte buffer.  SSL_read returns
> > >      -1 and error is SSL_ERROR_WANT_READ.
> >
> > So now you know that a read hit from select is needed (or other forward
> > progress must be made) before SSL_read can succeed.
>
> Sure.
>
> > >   6) I set a flag to ensure I do not call SSL_write() (it isn't
> > >      clear to me this is necessary -- the documentation is vague)
> > >      and select on the SSL session's descriptor for read.
> >
> > Why would you avoid writing? It's possible the other side will
> > not send any
> > application data.
>
> Because the documentation appears in several places to state that when I
> receive WANT_X from _any_ OpenSSL API operation in non-blocking mode, the
> next API operation I issue on that SSL * must be the exact same API
> operation, with the same parameters.

I'm not sure where you see that, but that is obviously bogus. Suppose we
have a protocol that permits either side to read or write at any time. We
call SSL_read in case the other side sent something, it returns WANT_READ,
because the other side didn't send anything. Are you seriously arguing that
we now can't send anything until the other side does? Isn't that obviously
an impossible requirement?

> We've also seen OpenSSL developers
> say as much on this mailing list in the past -- and in testing, I have
> in fact seen misbehavior with previous versions of OpenSSL (e.g. corrupt
> output SSL records) which the library should never be able to generate and
> which disappears if I impose this rule on my application's use of the API.

I have used OpenSSL for many, many years in non-blocking mode without these
crazy restrictions you think are necessary and never had a problem.

> Fortunately, because of the application-layer protocol I am using this
> cannot cause me to pause forever.  But, in the general case, as I
> understand it, you're positing that:
>
> 1) I see the file descriptor select as ready-for-read because a reneg
>    initiated by the peer was in process, and the peer sent some data
>    which will never cause any application-layer bytes to be ready for me.

Why did you select for read on the socket? How did you know that the OpenSSL
state machine needed to read from the socket to make further progress? Did
you assume that? On what basis? Again, you are trying to peer through the
OpenSSL black box.

> 2) I arrange not to SSL_write, temporarily, by setting a flag.
>
> 2) I call SSL_read(), which consumes the bytes the peer sent, and then,
>    since it has no app data to give me, returns -1 WANT_READ.
>
> 3) I sleep on the descriptor for read.  I won't do any other API
>    operation as I've excluded myself from writing.  If the peer will
>    never generate any data towards me of its own accord (if the
>    application-layer protocol lets it just go silent unless I send it
>    something) you conclude that I will sleep forever.

This is obvious silliness.

> I think in general your analysis of this possibility is correct, and
> that I should better investigate why I believe I need to avoid calling
> SSL_write in this case and how to fix it if it is really so inside
> OpenSSL.

If SSL_read returns WANT_READ, that means the SSL state machine cannot make
further progress (at least, assuming you have no application data to send)
until it can read from the socket. That is your queue to not call SSL_read
again until either the socket selects for readability or you call SSL_write.
(Any call to SSL_write can read application data from the socket.)

> But I also think that can't be the underlying issue causing the symptom
> I am actually seeing below.

I think it is, because renegotiation is one of the cases where your
assumptions fail.

> > >   7) The SSL session's file descriptor selects as ready for read,
> > >      I call SSL with the same 4096 byte buffer at the same address,
> > >      and SSL_read returns -1 and error is SSL_ERROR_SSL.
> > >
> > > I cannot understand why #7 occurs.  Is
> > > SSL_MODE_ENABLE_PARTIAL_WRITE just
> > > incompatible with non-blocking mode and renegotiations?

> > Perhaps the other side gave up waiting for you and closed the connection
> > down abruptly? What does the other side show when this happens?

> This happens in a matter of milliseconds; the other side shows either
> an SSL protocol error (though I haven't caught a bad SSL record coming
> out of OpenSSL towards the peer _yet_) or a close of the underlying
> TCP connection (FIN or RST sent towards the other side from my side).

Perhaps the other side doesn't understand the renegotiation request?

> It seems to me that no matter what else I may have wrong, *that* should
> not be the result of my control flow.  Let's suppose you're right and
> I can hang forever waiting for application data from the peer which
> never arrives, because of when I chose to select().  Nonetheless, it
> should not be the case that in a matter of milliseconds after #7,
> OpenSSL slams the door on the peer.

> And, curiously, if I don't use PARTIAL_WRITE, it doesn't do so...

I agree. And, by the way, I always use PARTIAL_WRITE. You may have found
some strange bug regarding PARTIAL_WRITE interacting with renegotiations.

DS


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

RE: SSL_renegotiate broken in non-blocking mode with PARTIAL_WRITE?

Reply via email to