Re: lynx-dev Lynx problem and fix for Open UNIX 8

Bela Lubkin Sat, 02 Mar 2002 14:42:42 -0800

Jonathan Schilling <[EMAIL PROTECTED]> wrote:

> >[Bela Lubkin of Caldera (and a past lynx contributor) suggests a bolder
> >change, involving coalescing the errno checking after all the networking
> >calls into one place, so that all these errno dependencies can be
> >centralized.  It's a good notion, but I don't feel confident enough in
> >my understanding of either lynx or networking across platforms to submit
> >it myself.]

Then I will submit it.  ;-}

Here is a copy of the message (actually a newsgroup posting) to which
Jon refers.  It is untested material, meant as fodder for Tom Dickey
rather than a definite patch.

Following the patch message is a reply I made to some questions Jon had
about integrating it with Lynx; possibly useful as background material.

>Bela<

=============================================================================

Date: Tue, 26 Feb 2002 23:48:43 -0800
From: Bela Lubkin <[EMAIL PROTECTED]>
Subject: Re: Stange problem with lynx2-8-3 or 5 on OpenUNIX 8
Newsgroups: comp.unix.sco.programmer
Message-ID: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]> 
<[EMAIL PROTECTED]>

J. L. Schilling wrote:

> Boyd Lynn Gerber <[EMAIL PROTECTED]> wrote in message 
>news:<[EMAIL PROTECTED]>...
> > I have a really strange problem after compiling lynx with either gcc or
> > cc.  I can connect to local hosts on my network with out any problems but
> > I can not connect to any remote hosts.  For example
> > 
> > ./lynx http://www.caldera.com
> > 
> > I get Alert!: Unable to connect to remote host.
> 
> OK, I've been playing around with this in Lynx 2.8.2.  The problem happens
> in the HTDoConnect function in HTTCP.c.  The first connect() attempt fails
> with an EINPROGRESS, meaning the operation is still ongoing and can be
> completed asychronously.  Then a select() is done to wait for that completion,
> with a timeout value of 100,000 microseconds.  That times out with no ready
> fds returned.  Finally a  second connect() is tried, but that also gives
> EINPROGRESS.  At this point the function gives up and returns an error.
> 
> I've found that if I increase the select() timeout value to 250,000 micro-
> seconds, then the select returns successfully with a ready fd, and the
> connection is successfully made, and the remote web site comes up in
> the browser.  (The change is at line 1469 of HTTCP.c in Lynx 2.8.2.)
> 
> I'm not sure why OU8 has slower connection times than UW7 ... moreover,
> on one of my OU8 machines the existing Lynx does connect successfully,
> and I'm not sure what's different about it (the /etc/resolv.conf is
> very similar, if that's involved) ... but anyway, why don't you try
> making the same change in your Lynx 2.8.x and post here with whether
> that fixes the problem for you as well.

Thanks for the analysis.  I can now see that Lynx already knows how to
handle this situation, but there's a slight dialect problem (exacerbated
by Lynx's organic growth over the years...)

I don't have an OU8 machine to play with so I can't fix this myself, but
I'm sure you can, Jon.

First, the absolute most recent Lynx source is:

  http://lynx.isc.org/current/lynx2.8.5dev.7.tar.bz2

Second, within its source file WWW/Library/Implementation/HTTCP.c, there
are four places where it tests whether SOCKET_ERRNO is either
EINPROGRESS or EALREADY.  In each case, it expects only one of those
errnos (plus, in each case, a few other different ones).

To fix the current problem, you're going to need to add EINPROGRESS to
the list of expected errnos in at least one of the EALREADY cases.
Apparently OpenUNIX and Solaris return different errnos for the same
condition.

I recommend examining all four and possibly replacing all of them with a
utility function which asks the question: "in accordance with the
dialect of Unix we're being compiled under, did the last socket syscall
fail with _any_ of the known `try again' error codes?"

Here's such a function, more-or-less in Lynx's idiom.  Slot it into
HTTCP.c, change the various references to EALREADY & EINPROGRESS to use
it, and see how things go.  Then submit diffs to [EMAIL PROTECTED]

>Bela<

=============================================================================
/*
 * Given that a socket syscall failed, this function asks: was that a
 * permanent failure, or is it a softer errno that means we should try
 * again (or the thing we've tried to do has already succeeded)?
 *
 * This should be called on the failure path, e.g.:
 *
 *      ret = connect( ... );
 *      if (ret < 0 && sock_call_failed(SOCKET_ERRNO)) {
 *              ... failure handling code ...
 *      }
 */

PUBLIC int sock_call_failed ARGS1(
        int,            socket_errno)
{
        switch(socket_errno) {
#ifdef       EINPROGRESS
        case EINPROGRESS:
#endif    /* EINPROGRESS */

#ifdef       EAGAIN
        case EAGAIN:
#endif    /* EAGAIN */

#ifdef       EALREADY
        case EALREADY:
#endif    /* EALREADY*/

#ifdef       EISCONN
        case EISCONN:
#endif    /* EISCONN */

#ifdef       UCX        /* For some reason, UCX pre 3 apparently returns */
        case 18242:     /* errno = 18242 instead of EALREADY or EISCONN. */
#endif    /* UCX */

                return 0;  /* success or some sort of soft failure */

        default:
                return 1;  /* permanent failure, as far as we can tell */
        }
}

=============================================================================

Date: Wed, 27 Feb 2002 10:43:07 -0800
From: Bela Lubkin <[EMAIL PROTECTED]>
To: Jonathan Schilling <[EMAIL PROTECTED]>
Cc: Boyd Lynn Gerber <[EMAIL PROTECTED]>, Ron Record <[EMAIL PROTECTED]>
Subject: Re: A solution thanks to J. L. Schilling... to lynx
Message-ID: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>

Jonathan Schilling wrote:

> Yes, I realized last night that I was making a cheap fix by lengthening
> the time-out period on the select().  One motivation was just to see if
> the same change fixed the problem on Boyd's machines, which apparently
> it does.  Another motivation was that I was leery of diving into the
> connect() code, because I've never well understood all the errno's that
> can come from it.
> 
> Your proposed solution certainly looks attractive ... but are you sure
> that it is safe, on all the different platforms that Lynx runs on?

No, but lynx-dev will take care of that by further modifying the
function as problems arise.  The net effect would still be a decent
cleanup of a shred of Lynx code.  (I was an active participant in
lynx-dev for several years so there's some pride of ownership here --
no, I wasn't responsible for any of this particular mess ;-)

> Are there any cases where one of these errnos is used for something
> different?  I'm especially suspicious of EAGAIN, which should inherently
> mean something else (our connection hasn't even started, because the
> thing isn't available -- this should have a longer time-out period
> than connection in progress, no?).  Also, in looking on the net I have
> seen cases where people add EWOULDBLOCK as a soft error deserving
> a retry as well.  (Our SCOhelp doc on non-blocking sockets says that 
> connect() can return EWOULDBLOCK, but our connect() man page doesn't
> list it.)  What do you think about EWOULDBLOCK?

I thought about adding it, decided that should wait for someone actually
observing the need.  I think of EWOULDBLOCK as something that is perhaps
not permanent, but more than just transient -- "if this went through you
would deadlock".

> In sum, I'm not sure what to do at this point.  Are you optimistic that
> lynx-dev would take back your proposed solution?

If you send a machine-patchable diff against their current top of tree
code, along with the fact that you've tested it on a system where the
change was needed (OU8) and one where it was not (UW711), I'm pretty
sure they would take it.  The central maintainer is Tom Dickey, he's
very very good about integrating received patches.

> And none of this explains why OU8 differs from UW7 in how long it takes
> to make the connection.  Given that this Mosaic/Lynx HTTCP code is 
> frequently referenced on the net as a model for a non-blocking connect()
> (and thus copied into other programs), it's a little worrying that OU8
> behaves differently.

Yes, that ought to be pursued as well.

First thing I would check is that the timeout mechanism in select() is
working right -- that it isn't somehow getting scaled (down) on OU8.
It's entirely unreasonable that a connect() might take even .1s,
nevermind .25s, if all the handshaking has already happened at the TCP
level (which I think it should have in this case).

>Bela<

; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to [EMAIL PROTECTED]

Re: lynx-dev Lynx problem and fix for Open UNIX 8

Reply via email to