Re: Potential fix

Darryl Miles Tue, 18 May 2010 00:25:32 -0700

David Schwartz wrote:

Joakim Tjernlund wrote:

I once wrote this patch to solve a problem which I logged to be:
  Let EAGAIN be fatal for write to socket. Needed
  to unlock a hung connection where the www client has
  stopped reading its socket.


Umm, if your code asks to wait forever until it can write, then that is what
it should do. The fix is nonsensical.

I cannot remember the details so I post the patch in hope it makes
sense to anyone. Possibly I have solved the problem at the wrong place.
Anyhow, here goes the patch:


You definitely solved the problem in the wrong place. If you don't want to
wait forever for a write to be possible, then don't do that. But if you do
that, then that's what should happen.

A "hung connection" *should* remain locked if the client stops reading from
its socket. The client might continue reading from its connection in a day,
a week, or a year. If the server wants to timeout the connection, it can and
should do so.

I agree with David's comments. Maybe Joakim should try to remember thedetails, as the devil is in the details. In particular EAGAIN shouldonly be observed from the operating system when the socket is put intonon-blocking mode.

Observing an EAGAIN from a write() or send() system call is anindication that the sending side is able to queue data faster than thereceiving side is able to dequeue it. This is a normal, anticipated andexpected condition, not an error. This is usually the case with mostapplications, since the receiving side has to also account for:

 * Network congestion, temporary network blockage/failure
 * Network propagation delay (round trip time)

* TCP Protocol congestion window (it takes an amount of time for theTCP window to open up, in the order of 6xRTT)* The time it takes the program at the remote end to callread()/recv() on the network socket.* Often also the processing time for the data (often thebiggest/slowest factor)


The sending side only has to account for:

* Ability to generate the data and hand it to the kernel. Moderndisks subsystems can be doing over 100Mb/sec, this is faster then evengigabit ethernet.

Is it normal to deal with this issue by using select() or poll() (orsimilar) to perform a limited wait, the kernel will then return controlback the to application once either the socket becomes writable (theEAGAIN condition has gone, in which case you retry your write()/send())or a timeout has expired (in which case you can then decide to takeaction and turn the EAGAIN condition yourself into a close of socketmaking it fatal). The select() and poll() kernel system calls provide away for the application to give up its CPU timeslice and get it backagain when the temporary I/O wait condition has cleared. Win32 API hasWaitForSingleObject() which does a similar thing.

My gut reaction is the same from me, Joakim's usage of the OpenSSL APIwas probably incorrect through lack of a full understanding of the issues.

In order for Joakim to prove his point of view is correct; Joakim shouldprovide a complete test case. This could then be peer reviewed.



Darryl
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Re: Potential fix

Reply via email to