I think I've discovered another problem with the current non-blocking API.

I have an application which reads data into fixed-size buffers which it
maintains per session.  It uses non-blocking IO and select() when a read
returns SSL_ERROR_WANT_{READ,WRITE}.

To conserve memory I reduced the buffer size from 16384 to 8192 and saw
sessions suddenly hang.  A coworker diagnosed this as follows:

1) The peer sends a SSL record larger than the buffer size.

2) We receive the SSL record.  The socket selects as ready to read.

3) We call SSL_read with our 8k buffer.  The received data does not fit,
   so OpenSSL buffers it internally and returns 8K with SSL_ERROR_WANT_READ.

4) We call select again for read on the socket (see attached quotation from
   SSL_read manual page!) but it never comes up ready, because OpenSSL has
   internally consumed the data in order to decrypt the SSL record!

The problem (again!  this pervades the non-blocking "API"!) is that a
single error code is used to indicate two different errors which require
*different* application behavior.  If SSL_ERROR_WANT_READ was returned
because the application did not supply a buffer of sufficient size, then
the application must immediately call SSL_read() again -- contradicting
the manual page description of the API.  But if SSL_ERROR_WANT_READ was
returned because the underlying file descriptor indicated not ready for
read, the applicaiton must immediately call select() or poll() again.

This can be determined heuristically but it would be far better to return
a different error code in each case.  At the very least, the manual page
needs to be revised to alert API users to this bug and suggest a workaround
(I *think* it may be sufficient to always call SSL_read() again if we
actually got any data but had SSL_ERROR_WANT_READ returned).

One possible workaround (which is gross, but feasible, I think) is to push
one byte back onto the socket so select() will DTRT.  *Shudder*.

Here is the manual page text which seems relevant:

       If the underlying BIO is non-blocking, SSL_read() will also return when
       the underlying BIO could not satisfy the needs of SSL_read() to con-
       tinue the operation. In this case a call to SSL_get_error(3) with the
       return value of SSL_read() will yield SSL_ERROR_WANT_READ or
       SSL_ERROR_WANT_WRITE. As at any time a re-negotiation is possible, a
       call to SSL_read() can also cause write operations! The calling process
       then must repeat the call after taking appropriate action to satisfy
       the needs of SSL_read(). The action depends on the underlying BIO. When
       using a non-blocking socket, nothing is to be done, but select() can be
       used to check for the required condition. When using a buffering BIO,
       like a BIO pair, data must be written into or retrieved out of the BIO
       before being able to continue.

Thor
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to