Re: Store_url_rewrite for squid 3+
On 9/5/2012 8:00 AM, Amos Jeffries wrote: On 5/09/2012 4:10 p.m., Eliezer Croitoru wrote: I'm reading some code (will take a while) to maybe get a functional store_url_rewrite for squid3+. Actually i was thinking about it a lot and the process should be very simple: use some stdin+stdout like for url_rewrite interface for starter. Yes. The redirect.cc interface can be used. The caller code runs *_access checks and provides a callback function which uses the resulting URL to do things to store keys. i think this is can be done pretty fast if someone knows the current code. it's kind of replicate the url_rewrite and change all the directives names from url_rewrite to store_url_rewrite and let someone write the functions. how is this as a starter to maybe make it work? Exactly what I was planning. Just a lack of store code knowledge getting in the way. It should be done before the request is being done and i think that before url_rewrite helper. At the point where cache key is being decided I think. Which is just before HIT lookup. After adaptations like url_rewrite. the next step after that is to take the "to be cached" entry and change the the key of the stored object. as it will be stored at in the cache and not like the old store_url_rewrite that was saved in a formart of old+new url. how about just start with the basics of putting the whole redirector as in with store_url_rewrite? how hard can it be? Want to give it a try? Well I would like to but it seems i'm lacking of knowledge in c\c++ and squid structure. I do recognize that there is a language there with some structure since I did wrote using java and ruby. (just wrote a small proxy for specific protocol that peers two tcp incoming connections after verification of identity and it seems to rock) I was thinking to start with creating a store_url_rewrite (fake) that will debug and all but will not do any changes except logging. If I\we can make this stage it will benefit anyone who will want to integrate some new interface for squid. So I started playing with the sources but since squid is a big puzzle for me I dont know where and what I should put or look for pieces of code related to the redirect.cc . what I did as a starter is to take redirect.cc (3.2.1)and change all the methods names I could find\think off to a "srewriter" syntax such as: redirectRegisterWithCacheManager(); to srewriteRegisterWithCacheManager(); file and diff attached. after that I tried to find anything related to redirect.cc in the makefiles and have seen: at: ./src/Makefile.am ./src/Makefile.in so i suppose i should add to every place of redirect.cc also srewrite.cc (but i'm dont know...) I didnt made any changed to the srewrite.cc that will make it "harmless" and "fake" yet. now the hard part for me is: how would I find any redirect.cc in the rest of squid and make sure to add the needed code for srewrite.cc? I can parse the srewrite.cc but since I dont know any of the code structure I wouldnt know what to extract and look for. I noticed kinkie did pretty things while parsing the code. I took a peer at the old store_url_rewrite and it seems to be divided into "client_side_storeurl_rewrite.c", "store_rewrite.c" and has some struct in structs.h for children concurrency and command. On 3.2.1 it's seems to be organized better and only in couple files such as the redirect.cc and others. any leads,? Eliezer -- Eliezer Croitoru https://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il #diff redirect.cc srewriter.cc 5c5 < * DEBUG: section 61Redirector --- > * DEBUG: section 61Srewriter 55c55 < /// url maximum lengh + extra informations passed to redirector --- > /// url maximum lengh + extra informations passed to srewriter 66c66 < } redirectStateData; --- > } srewriterStateData; 68,71c68,71 < static HLPCB redirectHandleReply; < static void redirectStateFree(redirectStateData * r); < static helper *redirectors = NULL; < static OBJH redirectStats; --- > static HLPCB srewriteHandleReply; > static void srewriterStateFree(srewriterStateData * r); > static helper *srewriters = NULL; > static OBJH srewriterStats; 73c73 < CBDATA_TYPE(redirectStateData); --- > CBDATA_TYPE(srewriterStateData); 76c76 < redirectHandleReply(void *data, char *reply) --- > srewriteHandleReply(void *data, char *reply) 78c78 < redirectStateData *r = static_cast(data); --- > srewriterStateData *r = static_cast(data); 81c81 < debugs(61, 5, "redirectHandleRead: {" << (reply && *reply != '\0' ? reply : "") << "}"); --- > debugs(61, 5, "srewriteHandleRead: {" << (reply && *reply != '\0' ? reply > : "") << "}"); 94c94 < redirectStateFree(r); --- > srewriterStateFree(r); 98c98 < redirectStateFree(redirectStateData * r) --- > srewriterStateFree(srewriterStateData * r) 105c105 < redirectStats(StoreEntry * sentry) --- > srewriterStats(StoreEntry * sentry) 107,108c107,108 < if (redirectors == NUL
Re: Store_url_rewrite for squid 3+
On 5/09/2012 4:10 p.m., Eliezer Croitoru wrote: I'm reading some code (will take a while) to maybe get a functional store_url_rewrite for squid3+. Actually i was thinking about it a lot and the process should be very simple: use some stdin+stdout like for url_rewrite interface for starter. Yes. The redirect.cc interface can be used. The caller code runs *_access checks and provides a callback function which uses the resulting URL to do things to store keys. i think this is can be done pretty fast if someone knows the current code. it's kind of replicate the url_rewrite and change all the directives names from url_rewrite to store_url_rewrite and let someone write the functions. how is this as a starter to maybe make it work? Exactly what I was planning. Just a lack of store code knowledge getting in the way. It should be done before the request is being done and i think that before url_rewrite helper. At the point where cache key is being decided I think. Which is just before HIT lookup. After adaptations like url_rewrite. the next step after that is to take the "to be cached" entry and change the the key of the stored object. as it will be stored at in the cache and not like the old store_url_rewrite that was saved in a formart of old+new url. how about just start with the basics of putting the whole redirector as in with store_url_rewrite? how hard can it be? Want to give it a try? Amos
Store_url_rewrite for squid 3+
I'm reading some code (will take a while) to maybe get a functional store_url_rewrite for squid3+. Actually i was thinking about it a lot and the process should be very simple: use some stdin+stdout like for url_rewrite interface for starter. i think this is can be done pretty fast if someone knows the current code. it's kind of replicate the url_rewrite and change all the directives names from url_rewrite to store_url_rewrite and let someone write the functions. how is this as a starter to maybe make it work? It should be done before the request is being done and i think that before url_rewrite helper. the next step after that is to take the "to be cached" entry and change the the key of the stored object. as it will be stored at in the cache and not like the old store_url_rewrite that was saved in a formart of old+new url. how about just start with the basics of putting the whole redirector as in with store_url_rewrite? how hard can it be? Thanks, Eliezer -- Eliezer Croitoru https://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
About his bug: 3620 that the IL mirror is not up to date.
I spoke with my ISP and they made double checks and the reason for the page appearing from 3 weeks ago is not due to their cache. they did some more research and it's the same from two different local ISPs that is not providing the hosting services for the IL mirror. I added two screenshots with trace-routes to help determine the cause to this problem. http://bugs.squid-cache.org/show_bug.cgi?id=3620 if it's not in this two ISPs then the reason is something on the server or on the hosting company. (somebody mentioned cache? LoL) I hope it will be resolved soon since it's annoying to try get the 3.2.1 and get a 404 error. Thanks, Eliezer -- Eliezer Croitoru https://www1.ngtech.co.il IT consulting for Nonprofit organizations eliezer ngtech.co.il
Re: Native FTP proxying
On 09/04/2012 05:14 PM, Henrik Nordström wrote: > tis 2012-09-04 klockan 16:29 -0600 skrev Alex Rousskov: > >> Why would "the second" approach be required by some? In other words, >> what kind of FTP functionality the first approach cannot support? I do >> not favor the second approach, but I would like to know what we are >> losing by using the first approach (other than some performance). > > FTP is a rather messy protocol with many commands including site > specific commands which can not be easily mapped to HTTP semantics > > And server state is also quite complex compared to HTTP. > > If you do > > cd /some/path/symlink > cd .. > > then you end up in .. relative to the symlink, not /some/path/, which is > a bit of a headache for mapping longer sessions. Sure, but apart from keeping client and server connections related (via the pinned connection API?), why would Squid care about this? It will just forward the first CD command and then the second, relaying each response to the client (all wrapped in a fake HTTP request while inside Squid core). Squid does not need to understand the details of the command semantics in this case or even keep "current directory" state, right? > On servers using drive names/letters state gets even messier as you then > usually have one active path per drive. I.e. windows/dos and various old > mainframe OS:es. The same "Squid does not care" logic would apply there, it seems. > But we do not need to cater for them all unless you plan on intercepting > port 21. And even then most of the odd ones can be ignored if there is > an acl to bypass interception on selected sites or users. Agreed: There will probably always be cases where Squid has to care but cannot handle some FTP specifics. It would be nice to have some specific examples of those (the "cd" one above do not see to apply). Thank you, Alex.
Re: Native FTP proxying
tis 2012-09-04 klockan 16:29 -0600 skrev Alex Rousskov: > Why would "the second" approach be required by some? In other words, > what kind of FTP functionality the first approach cannot support? I do > not favor the second approach, but I would like to know what we are > losing by using the first approach (other than some performance). FTP is a rather messy protocol with many commands including site specific commands which can not be easily mapped to HTTP semantics And server state is also quite complex compared to HTTP. If you do cd /some/path/symlink cd .. then you end up in .. relative to the symlink, not /some/path/, which is a bit of a headache for mapping longer sessions. On servers using drive names/letters state gets even messier as you then usually have one active path per drive. I.e. windows/dos and various old mainframe OS:es. But we do not need to cater for them all unless you plan on intercepting port 21. And even then most of the odd ones can be ignored if there is an acl to bypass interception on selected sites or users. Regards Henrik
Re: Native FTP proxying
On 09/04/2012 03:04 PM, Henrik Nordström wrote: > tis 2012-09-04 klockan 12:22 -0600 skrev Alex Rousskov: > >>> A FTP client-side might be accepted, allowing Squid to act as an >>> ftp->ftp gateway using HTTP semantics internally, even accepting >>> ftp->http gatewaying if you want. But not sure it makes sense to add >>> native FTP proxy support. >> >> Can you clarify the difference between "ftp->ftp gateway" and "native >> FTP proxy support"? Please assume that we are "using HTTP semantics >> internally" in both cases, which I interpret as not altering the >> forward.cc and Store code much -- the new FTP code will convert FTP >> commands into HTTP messages for internal processing and the code outside >> FTP client and server modules will remain pretty much the same. > > An ftp->ftp gateway using HTTP semantics internally is a FTP server > component (client_side..) that accepts FTP commands from FTP clients and > maps them to HTTP semantics which is then proxied by Squid. > > Native FTP proxy support means acting as an FTP proxy relaying FTP > commands between client and requested server without going via an > internal HTTP sematics mapping. > > > The first is very feasible and maps nicely to Squid. > > The second is very hard to fit within Squid in a meaningful manner other > than to provide basic server level access control, but is required by > some.. Why would "the second" approach be required by some? In other words, what kind of FTP functionality the first approach cannot support? I do not favor the second approach, but I would like to know what we are losing by using the first approach (other than some performance). Thank you, Alex.
Re: Native FTP proxying
tis 2012-09-04 klockan 12:22 -0600 skrev Alex Rousskov: > > A FTP client-side might be accepted, allowing Squid to act as an > > ftp->ftp gateway using HTTP semantics internally, even accepting > > ftp->http gatewaying if you want. But not sure it makes sense to add > > native FTP proxy support. > > Can you clarify the difference between "ftp->ftp gateway" and "native > FTP proxy support"? Please assume that we are "using HTTP semantics > internally" in both cases, which I interpret as not altering the > forward.cc and Store code much -- the new FTP code will convert FTP > commands into HTTP messages for internal processing and the code outside > FTP client and server modules will remain pretty much the same. An ftp->ftp gateway using HTTP semantics internally is a FTP server component (client_side..) that accepts FTP commands from FTP clients and maps them to HTTP semantics which is then proxied by Squid. Native FTP proxy support means acting as an FTP proxy relaying FTP commands between client and requested server without going via an internal HTTP sematics mapping. The first is very feasible and maps nicely to Squid. The second is very hard to fit within Squid in a meaningful manner other than to provide basic server level access control, but is required by some.. Regards Henrik
Re: BZR repository upgrade re-proposal.
tis 2012-09-04 klockan 11:00 -0600 skrev Alex Rousskov: > It would be nice to have specific instructions on how to update a local > branch to support 2a. I see "bzr upgrade" command but it is not clear > whether that is sufficient, especially with all the talk about two > repositories (old and 2a). upgrade is sufficient, but quite time consuming even if you only have one branch to upgrade as it needs to redo the complete history from revision 1. An alternative is to reuse the already upgraded repository to avoid upgrading past revisions already upgraded. This is done by using a shared repository (many branches sharing the same store) where you first create branches the already upgraded branches and then your branches. bzr init-repo squid-new cd squid-new bzr branch bzr+ssh://bzr.squid-cache.org/bzr/squid3/trunk bzr branch bzr+ssh://bzr.squid-cache.org/bzr/squid3/3.2 bzr branch bzr+ssh://bzr.squid-cache.org/bzr/squid3/3.1 bzr branch bzr+ssh://bzr.squid-cache.org/bzr/squid3/3.0 bzr branch /path/to/your/repo name [repeat for each branch] you need to reset push/pull locations etc after. you can rename/move things as you please, including a full rename of the main repository path (squid-new). the only requirement is that branches need to stay somewhere below the repository location. Regards Henrik
Re: Native FTP proxying
On 08/17/2012 06:47 PM, Henrik Nordström wrote: > fre 2012-08-17 klockan 18:03 -0600 skrev Alex Rousskov: >> What do you think? Should a quality implementation of native FTP support >> be accepted by the Squid Project (with specific major design choices to >> be discussed before the development, as usual)? > A FTP client-side might be accepted, allowing Squid to act as an > ftp->ftp gateway using HTTP semantics internally, even accepting > ftp->http gatewaying if you want. But not sure it makes sense to add > native FTP proxy support. Can you clarify the difference between "ftp->ftp gateway" and "native FTP proxy support"? Please assume that we are "using HTTP semantics internally" in both cases, which I interpret as not altering the forward.cc and Store code much -- the new FTP code will convert FTP commands into HTTP messages for internal processing and the code outside FTP client and server modules will remain pretty much the same. Thank you, Alex.
Re: BZR repository upgrade re-proposal.
On 09/04/2012 03:28 AM, Kinkie wrote: > Great. > I'm holding off the merge for a few days, I hope that this discussion > will complete positively soon. IIRC, I was unhappy about the earlier upgrade proposal because bzr on older but still supported RHEL boxes did not support 2a at the time, and I did not want to tell folks to install bzr manually. RH seems to have migrated to newer bzr versions (I see bzr v2.1.1 or later) so I do not see any reason to delay the upgrade further. It would be nice to have specific instructions on how to update a local branch to support 2a. I see "bzr upgrade" command but it is not clear whether that is sufficient, especially with all the talk about two repositories (old and 2a). Thank you, Alex.
Re: ICAP connections under heavy loads
On 09/04/2012 03:10 AM, Alexander Komyagin wrote: > On Fri, 2012-08-31 at 11:05 -0600, Alex Rousskov wrote: >> On 08/31/2012 09:01 AM, Alexander Komyagin wrote: >>> Alex, I figured it out, finally! The bug was in comm_connect_addr() >>> function (I suppose it is kernel-dependent though). >>> >>> Consider following call trace: >>> 1) Xaction starts ConnOpener in order to create new connection to ICAP >>> 2) ConnOpener calls comm_connect_addr() >>> 3) comm_connect_addr() initiates connection through connect() and >>> returns COMM_INPROGRESS >>> ...since our ICAP service is too busy connection will eventually timeout >>> (connectTimeout_), so... >>> 4) Comm::ConnOpener::timeout() is called >>> 5) Comm::ConnOpener::timeout calls connect() >>> 6) connect() calls comm_connect_addr() >>> 7) comm_connect_addr() calls getsockopt(SOL_SOCKET, SO_ERROR) to check >>> current connection status >>> 8) getsockopt(..) returns errno 0 <--- BUG >> >> Actually, the bug is in step #5. If we timed out, we should not try to >> connect again. > > In steps (5) and (6) it's not the OS connect() call, but > Comm::ConnOpener::connect(), Yes, I know. > so technically it's OK to call it on > timeout as long as we are sure that it is OK. Not sure what "technically" quantifier means in this context, but IMO, it is not correct and is a significant source of bugs and mess in Squid: The original author knows what the code is supposed to be doing but uses an unfortunate method name. As the time passes, right code is added to the wrong method and eventually nobody can tell what the method is supposed to be doing. The connect() method should try to connect. We should not try to connect after a timeout. Thus, we should not call the connect() method after a timeout. > I completely agree with you Alex, that logically ConnOpener::timeout() > shall better not call ConnOpener::connect(), though it's just a bad > design and not the actual source of the problem. Problem is that current > comm subsystem design allows upper-level code to successfully perform > comm_connect_addr() calls without waiting for Select() notification. I do not want to argue whether "bad design" is "the source of the problem" in this case. If fixing that "bad design" would fix the problem, then we should do that rather than trying to recover from an incorrect Comm calls order sequence. The latter has been proven very difficult in the past, and I doubt this case is an exception. >> Step #8 seems correct -- from the OS point of view, internal Squid >> timeout is not an error. If you fix step #5, step #8 will not happen though. >> >> >>> 9) comm_connect_addr() returns COMM_OK so ConnOpener would think that >>> connection was successfully established and pass the connection back to >>> Xaction object, then we would have noteCommRead and noteCommWrote >>> exceptions and so on... >>> >>> According to `man connect`, when connect() returns errno EINPROGRESS: >>> EINPROGRESS >>> The socket is nonblocking and the connection cannot be completed >>> immediately. It is possible to select(2) or poll(2) for >>> completion by selecting the socket for writing. After select(2) >>> indicates writability, use getsockopt(2) to read the SO_ERROR >>> option at level SOL_SOCKET to determine whether connect() >>> completed successfully (SO_ERROR is zero) or unsuccessfully >>> (SO_ERROR is one of the usual error codes listed here, >>> explaining the reason for the failure). >>> >>> Actually ConnOpener uses SetSelect(..) in order to wait before calling >>> comm_connect_addr() (and connect() ) again, but timeout in fact breaks >>> the rule calling getsockopt() right after connect(). >> >> The above man page excerpt says to call getsockopt() after successful >> select() notification. I think that is what we do now. > > Nope. On timeout it turns out that we're doing getsockopt() without > waiting for select() notification. That's the thing that shall be fixed > (maybe tricky) or correctly handled (as I did). I misinterpreted your "the rule calling getsockopt() right after connect()" as your definition of the rule and just re-stated the correct rule. Sorry. I do not take from-timeout-handler calls into account here because, as discussed above, the properly fixed code will not have them. I agree that the current code may violate the calling sequence during timeouts. >> * For the conn-open-timeout patch: >> >> Overall, I think this patch is fighting the side-effects of other bugs. >> Most importantly, the timeout handler should abort the ConnOpener job on >> the spot rather than go through one more select() try. I will comment on >> specific changes below, just in case, but please keep this overall >> problem in mind. >> >>> case COMM_INPROGRESS: >>> // check for timeout FIRST. >>> -if (squid_curtime - connectStart_ > connectTimeout_) { >>> +if (squid_curtime - connectStart_ >= connectTimeout_) { >>>
Re: ICAP connections under heavy loads
On 09/01/2012 03:26 AM, Amos Jeffries wrote: > On 1/09/2012 5:05 a.m., Alex Rousskov wrote: >> * For the conn-open-timeout patch: >> >> Most importantly, the timeout handler should abort the ConnOpener job on >> the spot rather than go through one more select() try. > ConnOpener should be obeying squid.conf connect_retries option for > number of times it closes the socket, re-open and re-connect()s. Yes, but IMO the timeout has higher priority than connect_retries. In other words, if we timed out, we should bail, even if we tried fewer than connect_retries times. Squid.conf documentation seems to agree with me: > This sets the maximum number of connection attempts made for each > TCP connection. The connect_retries attempts must all still > complete within the connection timeout period. Three designs are possible: 1. Bail after N retries, using T second timeout for each try. 2. Bail after N retries or T seconds, whichever comes first. 3. Bail after N retries (using T1 second timeout for each try) or after T2 seconds, whichever comes first. We should not do #1 IMO because we want to limit the total transaction wait time in most cases (and care less about lower-level details such as the number of retries). Documentation says that we do #2. I think we should do that. If we do not do that, we should fix the code. Specifically, ConnOpener's timeout handler should not call the connect() method. #3 is the most flexible approach but it requires more squid.conf options to distinguish a single-connect timeout from the aggregate connect timeout. I am not against that, but #2 seems sufficient until we have well-justified requests for #3. Hope this clarifies, Alex.
Re: ICAP connections under heavy loads
On 09/01/2012 03:26 AM, Amos Jeffries wrote: >> /// a helper job that connects to the ICAP service and >> /// calls our "connector" callback; unused on pconns >> Comm::ConnOpener *opener; >> >> It would be better to declare this just above the "connector" >> declaration to keep opener and connector close to each other. > > Why do we need to have a pointer to the opener AsyncJob anyway? > Cancel the Xaction callback instead is what I thought the correct action > was supposed to be. Alexander is trying to optimize the case where the requestor is gone and no more connects (with all their retries) and notifications are needed. Callback cancellation is sufficient for things to work correctly, but Alexander wants them to work faster or more efficiently. HTH, Alex.
Re: BZR repository upgrade re-proposal.
On Tue, Sep 4, 2012 at 2:25 AM, Henrik Nordström wrote: > tis 2012-09-04 klockan 11:41 +1200 skrev Amos Jeffries: > >> > There is also an alternative layout tree at /bzr/squid3-new-2a/ which >> > flattens the tree to >> > >> > ../ >> > trunk >> > 3.0 >> > 3.1 >> > 3.2 >> > >> > getting rid of the CVS references and other legacy. >> >> Does this obsolete the CVS mirror completely? or do you mean that is >> all pushed into the mirror sub-directories system instead of the bzr >> repo? > > I did not mirror the CVS branches into squid3-new-2a. But trunk have > full history as before. It is easy to copy over any additional branches > if needed and the old repository will still be available to "resurrect" > stuff from, only locked to prevent any accidental commits. > >> What do you mean by "other legacy"? > > There is a bit of cruft that have been collecting in branches/ > > There likely also is some other active branches we want to keep like > "mswin". The new layout is only a suggestion picking the most obvious > main branches. > >> Okay. As long as we have a scheduled time and process how-to >> distributed I don't see any problems there. > > Agreed. Upgrading using a local shared repository is fairly quick (some > minutes at most). > >> Do we have administrative time to see this through before the 15th >> Sept? I will be wanting to branch 3.3 by the end of this month and it >> would make sense to deal with less branches in the migration. > > The actual switch is only two mv commands. > > If there is any persistent local bzr repositories in the build farm or > anywhere else then those need to be verified 2a format. But I don't > think we have any of those do we? There are, but I will take care of that; the most impact will be a failed build. > Add a couple of symlinks and we can support both branches/ and the > proposed flattened layout at the same time, minimizing the number of > scripts and branch URLs that need updating. We can then successively > phase out the branches/ URLs when we see fit. Great. I'm holding off the merge for a few days, I hope that this discussion will complete positively soon. Robert said on IRC that his suggestion is to upgrade. -- /kinkie
Re: ICAP connections under heavy loads
On Fri, 2012-08-31 at 11:05 -0600, Alex Rousskov wrote: > On 08/31/2012 09:01 AM, Alexander Komyagin wrote: > > Alex, I figured it out, finally! The bug was in comm_connect_addr() > > function (I suppose it is kernel-dependent though). > > > > Consider following call trace: > > 1) Xaction starts ConnOpener in order to create new connection to ICAP > > 2) ConnOpener calls comm_connect_addr() > > 3) comm_connect_addr() initiates connection through connect() and > > returns COMM_INPROGRESS > > ...since our ICAP service is too busy connection will eventually timeout > > (connectTimeout_), so... > > 4) Comm::ConnOpener::timeout() is called > > 5) Comm::ConnOpener::timeout calls connect() > > 6) connect() calls comm_connect_addr() > > 7) comm_connect_addr() calls getsockopt(SOL_SOCKET, SO_ERROR) to check > > current connection status > > 8) getsockopt(..) returns errno 0 <--- BUG > > Actually, the bug is in step #5. If we timed out, we should not try to > connect again. In steps (5) and (6) it's not the OS connect() call, but Comm::ConnOpener::connect(), so technically it's OK to call it on timeout as long as we are sure that it is OK. I completely agree with you Alex, that logically ConnOpener::timeout() shall better not call ConnOpener::connect(), though it's just a bad design and not the actual source of the problem. Problem is that current comm subsystem design allows upper-level code to successfully perform comm_connect_addr() calls without waiting for Select() notification. > > Step #8 seems correct -- from the OS point of view, internal Squid > timeout is not an error. If you fix step #5, step #8 will not happen though. > > > > 9) comm_connect_addr() returns COMM_OK so ConnOpener would think that > > connection was successfully established and pass the connection back to > > Xaction object, then we would have noteCommRead and noteCommWrote > > exceptions and so on... > > > > According to `man connect`, when connect() returns errno EINPROGRESS: > > EINPROGRESS > > The socket is nonblocking and the connection cannot be completed > > immediately. It is possible to select(2) or poll(2) for > > completion by selecting the socket for writing. After select(2) > > indicates writability, use getsockopt(2) to read the SO_ERROR > > option at level SOL_SOCKET to determine whether connect() > > completed successfully (SO_ERROR is zero) or unsuccessfully > > (SO_ERROR is one of the usual error codes listed here, > > explaining the reason for the failure). > > > > Actually ConnOpener uses SetSelect(..) in order to wait before calling > > comm_connect_addr() (and connect() ) again, but timeout in fact breaks > > the rule calling getsockopt() right after connect(). > > The above man page excerpt says to call getsockopt() after successful > select() notification. I think that is what we do now. Nope. On timeout it turns out that we're doing getsockopt() without waiting for select() notification. That's the thing that shall be fixed (maybe tricky) or correctly handled (as I did). > > > * For the conn-open-timeout patch: > > Overall, I think this patch is fighting the side-effects of other bugs. > Most importantly, the timeout handler should abort the ConnOpener job on > the spot rather than go through one more select() try. I will comment on > specific changes below, just in case, but please keep this overall > problem in mind. > > > case COMM_INPROGRESS: > > // check for timeout FIRST. > > -if (squid_curtime - connectStart_ > connectTimeout_) { > > +if (squid_curtime - connectStart_ >= connectTimeout_) { > > debugs(5, 5, HERE << conn_ << ": * - ERR took too long > > already."); > > Please revert this change. In theory, we can be called back again before > (squid_curtime - connectStart_) changes. Not a big deal, of course, (the > equality should be a very rare event) but the old code was more accurate > from logic/math point of view. > > Also, if the debug message were to print by how much we were late, > printing zero would look strange. > > If there is some hidden side-effect or dependency here that I missed, > please let me know (but note that I think this code must not be called > from the timeout handler -- as discussed above, the timeout handler > should abort instead of double checking whether there is a timeout). I noticed that under lots of requests with *proper* ConnOpener timeout handling (after my fix) sometimes Squid fails on assertion in fd.cc:109 - closing an FD that is not open. So I assumed there is some inconsistency with ConnOpener timeouts. Honestly, I didn't find the exact cause of the fail, but I found that the underlying code in comm.cc checks the timeout by the strong condition: squid_curtime - fd_table[fd].writeStart) < Config.Timeout.write So the strong condition in ConnOpener could cause that inconsistency. After changing the condition to be weak (good from logic/math