Jonathan Schilling <[EMAIL PROTECTED]> wrote: > >[Bela Lubkin of Caldera (and a past lynx contributor) suggests a bolder > >change, involving coalescing the errno checking after all the networking > >calls into one place, so that all these errno dependencies can be > >centralized. It's a good notion, but I don't feel confident enough in > >my understanding of either lynx or networking across platforms to submit > >it myself.]
Then I will submit it. ;-} Here is a copy of the message (actually a newsgroup posting) to which Jon refers. It is untested material, meant as fodder for Tom Dickey rather than a definite patch. Following the patch message is a reply I made to some questions Jon had about integrating it with Lynx; possibly useful as background material. >Bela< ============================================================================= Date: Tue, 26 Feb 2002 23:48:43 -0800 From: Bela Lubkin <[EMAIL PROTECTED]> Subject: Re: Stange problem with lynx2-8-3 or 5 on OpenUNIX 8 Newsgroups: comp.unix.sco.programmer Message-ID: <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> J. L. Schilling wrote: > Boyd Lynn Gerber <[EMAIL PROTECTED]> wrote in message >news:<[EMAIL PROTECTED]>... > > I have a really strange problem after compiling lynx with either gcc or > > cc. I can connect to local hosts on my network with out any problems but > > I can not connect to any remote hosts. For example > > > > ./lynx http://www.caldera.com > > > > I get Alert!: Unable to connect to remote host. > > OK, I've been playing around with this in Lynx 2.8.2. The problem happens > in the HTDoConnect function in HTTCP.c. The first connect() attempt fails > with an EINPROGRESS, meaning the operation is still ongoing and can be > completed asychronously. Then a select() is done to wait for that completion, > with a timeout value of 100,000 microseconds. That times out with no ready > fds returned. Finally a second connect() is tried, but that also gives > EINPROGRESS. At this point the function gives up and returns an error. > > I've found that if I increase the select() timeout value to 250,000 micro- > seconds, then the select returns successfully with a ready fd, and the > connection is successfully made, and the remote web site comes up in > the browser. (The change is at line 1469 of HTTCP.c in Lynx 2.8.2.) > > I'm not sure why OU8 has slower connection times than UW7 ... moreover, > on one of my OU8 machines the existing Lynx does connect successfully, > and I'm not sure what's different about it (the /etc/resolv.conf is > very similar, if that's involved) ... but anyway, why don't you try > making the same change in your Lynx 2.8.x and post here with whether > that fixes the problem for you as well. Thanks for the analysis. I can now see that Lynx already knows how to handle this situation, but there's a slight dialect problem (exacerbated by Lynx's organic growth over the years...) I don't have an OU8 machine to play with so I can't fix this myself, but I'm sure you can, Jon. First, the absolute most recent Lynx source is: http://lynx.isc.org/current/lynx2.8.5dev.7.tar.bz2 Second, within its source file WWW/Library/Implementation/HTTCP.c, there are four places where it tests whether SOCKET_ERRNO is either EINPROGRESS or EALREADY. In each case, it expects only one of those errnos (plus, in each case, a few other different ones). To fix the current problem, you're going to need to add EINPROGRESS to the list of expected errnos in at least one of the EALREADY cases. Apparently OpenUNIX and Solaris return different errnos for the same condition. I recommend examining all four and possibly replacing all of them with a utility function which asks the question: "in accordance with the dialect of Unix we're being compiled under, did the last socket syscall fail with _any_ of the known `try again' error codes?" Here's such a function, more-or-less in Lynx's idiom. Slot it into HTTCP.c, change the various references to EALREADY & EINPROGRESS to use it, and see how things go. Then submit diffs to [EMAIL PROTECTED] >Bela< ============================================================================= /* * Given that a socket syscall failed, this function asks: was that a * permanent failure, or is it a softer errno that means we should try * again (or the thing we've tried to do has already succeeded)? * * This should be called on the failure path, e.g.: * * ret = connect( ... ); * if (ret < 0 && sock_call_failed(SOCKET_ERRNO)) { * ... failure handling code ... * } */ PUBLIC int sock_call_failed ARGS1( int, socket_errno) { switch(socket_errno) { #ifdef EINPROGRESS case EINPROGRESS: #endif /* EINPROGRESS */ #ifdef EAGAIN case EAGAIN: #endif /* EAGAIN */ #ifdef EALREADY case EALREADY: #endif /* EALREADY*/ #ifdef EISCONN case EISCONN: #endif /* EISCONN */ #ifdef UCX /* For some reason, UCX pre 3 apparently returns */ case 18242: /* errno = 18242 instead of EALREADY or EISCONN. */ #endif /* UCX */ return 0; /* success or some sort of soft failure */ default: return 1; /* permanent failure, as far as we can tell */ } } ============================================================================= Date: Wed, 27 Feb 2002 10:43:07 -0800 From: Bela Lubkin <[EMAIL PROTECTED]> To: Jonathan Schilling <[EMAIL PROTECTED]> Cc: Boyd Lynn Gerber <[EMAIL PROTECTED]>, Ron Record <[EMAIL PROTECTED]> Subject: Re: A solution thanks to J. L. Schilling... to lynx Message-ID: <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> Jonathan Schilling wrote: > Yes, I realized last night that I was making a cheap fix by lengthening > the time-out period on the select(). One motivation was just to see if > the same change fixed the problem on Boyd's machines, which apparently > it does. Another motivation was that I was leery of diving into the > connect() code, because I've never well understood all the errno's that > can come from it. > > Your proposed solution certainly looks attractive ... but are you sure > that it is safe, on all the different platforms that Lynx runs on? No, but lynx-dev will take care of that by further modifying the function as problems arise. The net effect would still be a decent cleanup of a shred of Lynx code. (I was an active participant in lynx-dev for several years so there's some pride of ownership here -- no, I wasn't responsible for any of this particular mess ;-) > Are there any cases where one of these errnos is used for something > different? I'm especially suspicious of EAGAIN, which should inherently > mean something else (our connection hasn't even started, because the > thing isn't available -- this should have a longer time-out period > than connection in progress, no?). Also, in looking on the net I have > seen cases where people add EWOULDBLOCK as a soft error deserving > a retry as well. (Our SCOhelp doc on non-blocking sockets says that > connect() can return EWOULDBLOCK, but our connect() man page doesn't > list it.) What do you think about EWOULDBLOCK? I thought about adding it, decided that should wait for someone actually observing the need. I think of EWOULDBLOCK as something that is perhaps not permanent, but more than just transient -- "if this went through you would deadlock". > In sum, I'm not sure what to do at this point. Are you optimistic that > lynx-dev would take back your proposed solution? If you send a machine-patchable diff against their current top of tree code, along with the fact that you've tested it on a system where the change was needed (OU8) and one where it was not (UW711), I'm pretty sure they would take it. The central maintainer is Tom Dickey, he's very very good about integrating received patches. > And none of this explains why OU8 differs from UW7 in how long it takes > to make the connection. Given that this Mosaic/Lynx HTTCP code is > frequently referenced on the net as a model for a non-blocking connect() > (and thus copied into other programs), it's a little worrying that OU8 > behaves differently. Yes, that ought to be pursued as well. First thing I would check is that the timeout mechanism in select() is working right -- that it isn't somehow getting scaled (down) on OU8. It's entirely unreasonable that a connect() might take even .1s, nevermind .25s, if all the handshaking has already happened at the TCP level (which I think it should have in this case). >Bela< ; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to [EMAIL PROTECTED]
