looks like the number was original 3, and then bumped up to 10 because of
observed disconnects "in practice"
http://code.sixapart.com/trac/djabberd/changeset/758

On Thu, Feb 28, 2008 at 11:33 AM, Jacob Burkhart <[EMAIL PROTECTED]> wrote:

> Hi,
> So we've been experiencing some bizarre behavior with clients randomly
> disconnecting when trying to send messages over SSL.  The disconnects vary
> based on size of message and speed of network connections.
>
> The problems seems to stem from:
>
> http://code.sixapart.com/trac/djabberd/browser/trunk/DJabberd/lib/DJabberd/Connection.pm#L494
>
> In http://code.sixapart.com/trac/djabberd/changeset/756 bradfitz added
> code to handle non-graceful SSL disconnects.  However, the number of times
> to retry read before giving up (10) proved to be 1 to 2 too few times in our
> particular situation.
>
> But why did he pick 10?
>
> In our tests, the number of times that Net::SSLeay::read can return zero
> bytes in a row varies based on the speed of a client's connection and the
> size of bytes attempting to be sent.
>
> During stream initiation for instance, we experience about 2 reads of zero
> in a row from clients connected on reasonably fast connections, and 0 reads
> of zero in a row from a client running on the same machine as the server.
>
> When sending messages larger than 8k, the number of zero reads is
> somewhere under 10 for clients connection over local ethernet, but is in the
> 11 to 12 range for clients connecting over WiFi.
>
> So clearly, counting the number of zero bytes reads that happen is a row
> is not the most reliable way to determine if the client has improperly
> disconnected
>
> It turns out that if we look at the Djabberd code for SSL write, we can
> see that there is a different answer for handling write failures.
>
>
> http://code.sixapart.com/trac/djabberd/browser/trunk/DJabberd/lib/DJabberd/Stanza/StartTLS.pm#L90
>
> Here, get_error is called to determine if the problem might have been
> caused by SSL_ERROR_WANT_READ or perhaps SSL_ERROR_WANT_WRITE.  If it is,
> then this is not a fatal condition, and we don't want to close the
> connection.  instead, we simply return zero and expect that the writer will
> come around and try again at some point.
>
> So why not do the same thing for read failures?
>
> Please consider my patch (attached) as a possible solution for this
> problem.
>
> thanks,
> Jacob
>
>
>
>
>
>
>

Reply via email to