looks like the number was original 3, and then bumped up to 10 because of observed disconnects "in practice" http://code.sixapart.com/trac/djabberd/changeset/758
On Thu, Feb 28, 2008 at 11:33 AM, Jacob Burkhart <[EMAIL PROTECTED]> wrote: > Hi, > So we've been experiencing some bizarre behavior with clients randomly > disconnecting when trying to send messages over SSL. The disconnects vary > based on size of message and speed of network connections. > > The problems seems to stem from: > > http://code.sixapart.com/trac/djabberd/browser/trunk/DJabberd/lib/DJabberd/Connection.pm#L494 > > In http://code.sixapart.com/trac/djabberd/changeset/756 bradfitz added > code to handle non-graceful SSL disconnects. However, the number of times > to retry read before giving up (10) proved to be 1 to 2 too few times in our > particular situation. > > But why did he pick 10? > > In our tests, the number of times that Net::SSLeay::read can return zero > bytes in a row varies based on the speed of a client's connection and the > size of bytes attempting to be sent. > > During stream initiation for instance, we experience about 2 reads of zero > in a row from clients connected on reasonably fast connections, and 0 reads > of zero in a row from a client running on the same machine as the server. > > When sending messages larger than 8k, the number of zero reads is > somewhere under 10 for clients connection over local ethernet, but is in the > 11 to 12 range for clients connecting over WiFi. > > So clearly, counting the number of zero bytes reads that happen is a row > is not the most reliable way to determine if the client has improperly > disconnected > > It turns out that if we look at the Djabberd code for SSL write, we can > see that there is a different answer for handling write failures. > > > http://code.sixapart.com/trac/djabberd/browser/trunk/DJabberd/lib/DJabberd/Stanza/StartTLS.pm#L90 > > Here, get_error is called to determine if the problem might have been > caused by SSL_ERROR_WANT_READ or perhaps SSL_ERROR_WANT_WRITE. If it is, > then this is not a fatal condition, and we don't want to close the > connection. instead, we simply return zero and expect that the writer will > come around and try again at some point. > > So why not do the same thing for read failures? > > Please consider my patch (attached) as a possible solution for this > problem. > > thanks, > Jacob > > > > > > >