David Schwartz wrote:
Joakim Tjernlund wrote:

I once wrote this patch to solve a problem which I logged to be:
  Let EAGAIN be fatal for write to socket. Needed
  to unlock a hung connection where the www client has
  stopped reading its socket.

Umm, if your code asks to wait forever until it can write, then that is what
it should do. The fix is nonsensical.

I cannot remember the details so I post the patch in hope it makes
sense to anyone. Possibly I have solved the problem at the wrong place.
Anyhow, here goes the patch:

You definitely solved the problem in the wrong place. If you don't want to
wait forever for a write to be possible, then don't do that. But if you do
that, then that's what should happen.

A "hung connection" *should* remain locked if the client stops reading from
its socket. The client might continue reading from its connection in a day,
a week, or a year. If the server wants to timeout the connection, it can and
should do so.

I agree with David's comments. Maybe Joakim should try to remember the details, as the devil is in the details. In particular EAGAIN should only be observed from the operating system when the socket is put into non-blocking mode.

Observing an EAGAIN from a write() or send() system call is an indication that the sending side is able to queue data faster than the receiving side is able to dequeue it. This is a normal, anticipated and expected condition, not an error. This is usually the case with most applications, since the receiving side has to also account for:
 * Network congestion, temporary network blockage/failure
 * Network propagation delay (round trip time)
* TCP Protocol congestion window (it takes an amount of time for the TCP window to open up, in the order of 6xRTT) * The time it takes the program at the remote end to call read()/recv() on the network socket. * Often also the processing time for the data (often the biggest/slowest factor)

The sending side only has to account for:
* Ability to generate the data and hand it to the kernel. Modern disks subsystems can be doing over 100Mb/sec, this is faster then even gigabit ethernet.

Is it normal to deal with this issue by using select() or poll() (or similar) to perform a limited wait, the kernel will then return control back the to application once either the socket becomes writable (the EAGAIN condition has gone, in which case you retry your write()/send()) or a timeout has expired (in which case you can then decide to take action and turn the EAGAIN condition yourself into a close of socket making it fatal). The select() and poll() kernel system calls provide a way for the application to give up its CPU timeslice and get it back again when the temporary I/O wait condition has cleared. Win32 API has WaitForSingleObject() which does a similar thing.



My gut reaction is the same from me, Joakim's usage of the OpenSSL API was probably incorrect through lack of a full understanding of the issues.

In order for Joakim to prove his point of view is correct; Joakim should provide a complete test case. This could then be peer reviewed.


Darryl
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to