Re: [squid-users] How get negative cache along with origin server error?
On Sat, Oct 04, 2008 at 12:55:15PM -0400, Chris Nighswonger wrote: On Tue, Sep 30, 2008 at 6:13 PM, Dave Dykstra [EMAIL PROTECTED] wrote: On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote: I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. A bit off-topic here, but I'm wondering if these squids are being used in CERN's new computing grid? I noticed Fermi was helping out with this. (http://devicedaily.com/misc/cern-launches-the-biggest-computing-grid-in-the-world.html) The particular squids I was talking about are not considered to be part of the grid, they're part of the High-Level Trigger filter farm that is installed at the location of the CMS detector. There are other squids that are considered to be part of the grid, however, at each of the locations around the world where CMS collision data is being analyzed. I own the piece of the software involved in moving detector alignment calibration data from CERN out to all the processors at all the collaboration sites, which is needed to be able to understand the collision data. This data is on the order of 100MB but needs to get sent to all the analysis jobs (and some of it changes every day or so), unlike the collision data which is much larger but gets sent separately to individual processors. The software I own converts the data from a database to http where it is cached in squids and then converts the data from http to objects in memory. The home page is frontier.cern.ch. That article is misleading, by the way; the very nature of a computing grid is that it doesn't belong to a single organization, so it's not CERN's new computing grid. It is a collaboration of many organizations; many different organizations provide the computing resources, and many different organizations provide the software that controls the grid and the software that runs on the grid. - Dave
Re: [squid-users] How get negative cache along with origin server error?
Mark, Thanks for that suggestion. I had independently come to the same idea, after posting my message, but haven't yet had a chance to try it out. I currently have hierarchies of cache_peer parents but stop the hierarchies just before the last step to the origin servers because they were selected by the host port number in the URLs. The origin servers have their own squids configured in accelerator mode so I think I will just extend the hierarchies all the way to them and let the squids (the ones which were formerly the top of the hierarchies) take care of detecting when an origin server goes down (using the cache_peer monitorurl option). I did a little experiment and found out that it doesn't matter what the host and port number are in a URL if the top of a cache_peer parent hierarchy is an accelerator mode squid, so I don't think I'll even have to change the application. - Dave On Fri, Oct 03, 2008 at 11:21:19AM +1000, Mark Nottingham wrote: Have you considered setting squid up to know about both origins, so it can fail over automatically? On 26/09/2008, at 5:04 AM, Dave Dykstra wrote: I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. There are two origin servers, and the application layer is designed to try the second server if the local squid returns a 5xx HTTP code (server error). I just recently found that before squid 2.7 this could never happen because squid would just return stale data if the origin server was down (more precisely, I've been testing with the server up but the listener process down so it gets 'connection refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if the origin server sends 'Cache-Control: must-revalidate' then squid will send a 504 Gateway Timeout error. Unfortunately, this timeout error does not get cached, and it gets sent upstream every time no matter what negative_ttl is set to. These squids are configured in a hierarchy where each feeds 4 others so loading gets spread out, but the fact that the error is not cached at all means that if the primary origin server is down, the squids near the top of the hierarchy will get hammered with hundreds of requests for the server that's down before every request that succeeds from the second server. Any suggestions? Is the fact that negative_ttl doesn't work with max_stale a bug, a missing feature, or an unfortunate interpretation of the HTTP 1.1 spec? By the way, I had hoped that 'Cache-Control: max-stale=0' would work the same as squid.conf's 'max_stale 0' but I never see an error come back when the origin server is down; it returns stale data instead. I wonder if that's intentional, a bug, or a missing feature. I also note that the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is stale) header attached if stale data is returned and I'm not seeing those. - Dave
Re: [squid-users] How get negative cache along with origin server error?
Henrik, Thanks so much for your very informative reply! On Thu, Oct 02, 2008 at 12:31:03PM +0200, Henrik Nordstrom wrote: By default Squid tries to use a parent 10 times before declaring it dead. Ah, I never would have guessed that I needed to try 10 times before negative_ttl would take effect for a dead host. That wouldn't be bad at all. I just tried this now by having two squids, one a cache_peer parent of the other. I requested a URL while the origin server was up in order to load the cache, with a CC max-age of 180. Both squids have max_stale 0 and negative_ttl of 3 minutes. Next, I put the origin server name as an alias for localhost in /etc/hosts on both machines the squids were on, so they both see connection refused when they try to connect to the origin server. I also restarted nscd and did squid -k reconfigure to make sure the new host name was seen by squid. After the (small) object in the cache expired, I retried the request 20 times in a row. Every time I still saw the request get sent from the child squid to the parent squid and return a 504 error. This is unexpected to me; is it to you, Henrik? I would have thought the 504 error would get cached for three minutes after the tenth try. Each time Squid retries a request it falls back on the next possible path for forwarding the request. What that is depends on your configuration. In normal forwarding without never_direct there usually never is more than at most two selected active paths: Selected peer if any + going direct. In accelerator mode or with never_direct more peers is selected as candidates (one sibling, and all possible parents). These retries happens on * 504 Gateway Timeout (including local connection failure) * 502 Bad gateway or if retry_on_error is enabled also on * 401 Forbidden * 500 Server Error * 501 Not Implemented * 503 Service not available Please note that there is a slight name confusion relating to max-stale. Cache-Control: max-stale is not the same as the squid.conf directive. Cache-Control: max-stale=N is a permissive request directive, saying that responses up to the given staleness is accepted as fresh without needing a cache validation. It's not defined for responses. The squid.conf setting is a restrictive directive, placing an upper limit on how stale content may be returned if cache validations fail. The Cache-Control: stale-if-error response header is equivalent the squid.conf max-stale setting, and overrides squid.conf. That's very good to know. I didn't see that in the HTTP 1.1 spec, but I see that Mark Nottingham submitted a draft protocol extension with this feature. The default for stale-if-error if not specified (and squid.conf max-stale) is infinite. Warning headers is not yet implemented by Squid. This is on the todo. Sounds good. - Dave
Re: [squid-users] How get negative cache along with origin server error?
On tis, 2008-10-07 at 11:49 -0500, Dave Dykstra wrote: Ah, I never would have guessed that I needed to try 10 times before negative_ttl would take effect for a dead host. That wouldn't be bad at all. You don't. Squid does that for you automatically. time I still saw the request get sent from the child squid to the parent squid and return a 504 error. This is unexpected to me; is it to you, Henrik? I would have thought the 504 error would get cached for three minutes after the tenth try. Agreed. The Cache-Control: stale-if-error response header is equivalent the squid.conf max-stale setting, and overrides squid.conf. That's very good to know. I didn't see that in the HTTP 1.1 spec, but I see that Mark Nottingham submitted a draft protocol extension with this feature. Correct. Regards Henrik signature.asc Description: This is a digitally signed message part
Re: [squid-users] How get negative cache along with origin server error?
On Tue, Oct 07, 2008 at 08:38:12PM +0200, Henrik Nordstrom wrote: On tis, 2008-10-07 at 11:49 -0500, Dave Dykstra wrote: Ah, I never would have guessed that I needed to try 10 times before negative_ttl would take effect for a dead host. That wouldn't be bad at all. You don't. Squid does that for you automatically. I meant, in my testing I needed to try 10 times to see if negative_ttl caching was working. Or are you saying that squid tries to contact the origin server 10 times on the first request before it even returns the first 504? I thought you meant it kept track of the number of client attempts and should start caching it after 10 failures. time I still saw the request get sent from the child squid to the parent squid and return a 504 error. This is unexpected to me; is it to you, Henrik? I would have thought the 504 error would get cached for three minutes after the tenth try. Agreed. Ok, then I will file a bugzilla report. Meanwhile, I belive I have a workaround as I discussed in another post on this thread http://www.squid-cache.org/mail-archive/squid-users/200810/0171.html Thanks, - Dave
Re: [squid-users] How get negative cache along with origin server error?
Hi Dave, On Tue, Sep 30, 2008 at 6:13 PM, Dave Dykstra [EMAIL PROTECTED] wrote: I found out a little bit more by looking in the source code and the generated headers and setting a few breakpoints. The squid closest to the origin server that is down (the one at the top of the cache_peer parent hierarchy) never attempts to store the negative result. Worse, it sets an Expires: header that is equal to the current time. Squids further down the hierarchy do call storeNegativeCache() but they see an expiration time that is already past so it isn't of any use. Those things make it seem like squid is far from being able to effectively handle failing over from one origin server to another at the application level. - Dave On Tue, Sep 30, 2008 at 10:32:43AM -0500, Dave Dykstra wrote: Do any of the squid experts have any answers for this? - Dave On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote: I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. A bit off-topic here, but I'm wondering if these squids are being used in CERN's new computing grid? I noticed Fermi was helping out with this. (http://devicedaily.com/misc/cern-launches-the-biggest-computing-grid-in-the-world.html) Regards, Chris
Re: [squid-users] How get negative cache along with origin server error?
By default Squid tries to use a parent 10 times before declaring it dead. Each time Squid retries a request it falls back on the next possible path for forwarding the request. What that is depends on your configuration. In normal forwarding without never_direct there usually never is more than at most two selected active paths: Selected peer if any + going direct. In accelerator mode or with never_direct more peers is selected as candidates (one sibling, and all possible parents). These retries happens on * 504 Gateway Timeout (including local connection failure) * 502 Bad gateway or if retry_on_error is enabled also on * 401 Forbidden * 500 Server Error * 501 Not Implemented * 503 Service not available Please note that there is a slight name confusion relating to max-stale. Cache-Control: max-stale is not the same as the squid.conf directive. Cache-Control: max-stale=N is a permissive request directive, saying that responses up to the given staleness is accepted as fresh without needing a cache validation. It's not defined for responses. The squid.conf setting is a restrictive directive, placing an upper limit on how stale content may be returned if cache validations fail. The Cache-Control: stale-if-error response header is equivalent the squid.conf max-stale setting, and overrides squid.conf. The default for stale-if-error if not specified (and squid.conf max-stale) is infinite. Warning headers is not yet implemented by Squid. This is on the todo. Regards Henrik On tis, 2008-09-30 at 10:32 -0500, Dave Dykstra wrote: Do any of the squid experts have any answers for this? - Dave On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote: I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. There are two origin servers, and the application layer is designed to try the second server if the local squid returns a 5xx HTTP code (server error). I just recently found that before squid 2.7 this could never happen because squid would just return stale data if the origin server was down (more precisely, I've been testing with the server up but the listener process down so it gets 'connection refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if the origin server sends 'Cache-Control: must-revalidate' then squid will send a 504 Gateway Timeout error. Unfortunately, this timeout error does not get cached, and it gets sent upstream every time no matter what negative_ttl is set to. These squids are configured in a hierarchy where each feeds 4 others so loading gets spread out, but the fact that the error is not cached at all means that if the primary origin server is down, the squids near the top of the hierarchy will get hammered with hundreds of requests for the server that's down before every request that succeeds from the second server. Any suggestions? Is the fact that negative_ttl doesn't work with max_stale a bug, a missing feature, or an unfortunate interpretation of the HTTP 1.1 spec? By the way, I had hoped that 'Cache-Control: max-stale=0' would work the same as squid.conf's 'max_stale 0' but I never see an error come back when the origin server is down; it returns stale data instead. I wonder if that's intentional, a bug, or a missing feature. I also note that the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is stale) header attached if stale data is returned and I'm not seeing those. - Dave signature.asc Description: This is a digitally signed message part
Re: [squid-users] How get negative cache along with origin server error?
Have you considered setting squid up to know about both origins, so it can fail over automatically? On 26/09/2008, at 5:04 AM, Dave Dykstra wrote: I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. There are two origin servers, and the application layer is designed to try the second server if the local squid returns a 5xx HTTP code (server error). I just recently found that before squid 2.7 this could never happen because squid would just return stale data if the origin server was down (more precisely, I've been testing with the server up but the listener process down so it gets 'connection refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if the origin server sends 'Cache-Control: must-revalidate' then squid will send a 504 Gateway Timeout error. Unfortunately, this timeout error does not get cached, and it gets sent upstream every time no matter what negative_ttl is set to. These squids are configured in a hierarchy where each feeds 4 others so loading gets spread out, but the fact that the error is not cached at all means that if the primary origin server is down, the squids near the top of the hierarchy will get hammered with hundreds of requests for the server that's down before every request that succeeds from the second server. Any suggestions? Is the fact that negative_ttl doesn't work with max_stale a bug, a missing feature, or an unfortunate interpretation of the HTTP 1.1 spec? By the way, I had hoped that 'Cache-Control: max-stale=0' would work the same as squid.conf's 'max_stale 0' but I never see an error come back when the origin server is down; it returns stale data instead. I wonder if that's intentional, a bug, or a missing feature. I also note that the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is stale) header attached if stale data is returned and I'm not seeing those. - Dave -- Mark Nottingham [EMAIL PROTECTED]
Re: [squid-users] How get negative cache along with origin server error?
Do any of the squid experts have any answers for this? - Dave On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote: I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. There are two origin servers, and the application layer is designed to try the second server if the local squid returns a 5xx HTTP code (server error). I just recently found that before squid 2.7 this could never happen because squid would just return stale data if the origin server was down (more precisely, I've been testing with the server up but the listener process down so it gets 'connection refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if the origin server sends 'Cache-Control: must-revalidate' then squid will send a 504 Gateway Timeout error. Unfortunately, this timeout error does not get cached, and it gets sent upstream every time no matter what negative_ttl is set to. These squids are configured in a hierarchy where each feeds 4 others so loading gets spread out, but the fact that the error is not cached at all means that if the primary origin server is down, the squids near the top of the hierarchy will get hammered with hundreds of requests for the server that's down before every request that succeeds from the second server. Any suggestions? Is the fact that negative_ttl doesn't work with max_stale a bug, a missing feature, or an unfortunate interpretation of the HTTP 1.1 spec? By the way, I had hoped that 'Cache-Control: max-stale=0' would work the same as squid.conf's 'max_stale 0' but I never see an error come back when the origin server is down; it returns stale data instead. I wonder if that's intentional, a bug, or a missing feature. I also note that the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is stale) header attached if stale data is returned and I'm not seeing those. - Dave
Re: [squid-users] How get negative cache along with origin server error?
I found out a little bit more by looking in the source code and the generated headers and setting a few breakpoints. The squid closest to the origin server that is down (the one at the top of the cache_peer parent hierarchy) never attempts to store the negative result. Worse, it sets an Expires: header that is equal to the current time. Squids further down the hierarchy do call storeNegativeCache() but they see an expiration time that is already past so it isn't of any use. Those things make it seem like squid is far from being able to effectively handle failing over from one origin server to another at the application level. - Dave On Tue, Sep 30, 2008 at 10:32:43AM -0500, Dave Dykstra wrote: Do any of the squid experts have any answers for this? - Dave On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote: I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. There are two origin servers, and the application layer is designed to try the second server if the local squid returns a 5xx HTTP code (server error). I just recently found that before squid 2.7 this could never happen because squid would just return stale data if the origin server was down (more precisely, I've been testing with the server up but the listener process down so it gets 'connection refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if the origin server sends 'Cache-Control: must-revalidate' then squid will send a 504 Gateway Timeout error. Unfortunately, this timeout error does not get cached, and it gets sent upstream every time no matter what negative_ttl is set to. These squids are configured in a hierarchy where each feeds 4 others so loading gets spread out, but the fact that the error is not cached at all means that if the primary origin server is down, the squids near the top of the hierarchy will get hammered with hundreds of requests for the server that's down before every request that succeeds from the second server. Any suggestions? Is the fact that negative_ttl doesn't work with max_stale a bug, a missing feature, or an unfortunate interpretation of the HTTP 1.1 spec? By the way, I had hoped that 'Cache-Control: max-stale=0' would work the same as squid.conf's 'max_stale 0' but I never see an error come back when the origin server is down; it returns stale data instead. I wonder if that's intentional, a bug, or a missing feature. I also note that the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is stale) header attached if stale data is returned and I'm not seeing those. - Dave
[squid-users] How get negative cache along with origin server error?
I am running squid on over a thousand computers that are filtering data coming out of one of the particle collision detectors on the Large Hadron Collider. There are two origin servers, and the application layer is designed to try the second server if the local squid returns a 5xx HTTP code (server error). I just recently found that before squid 2.7 this could never happen because squid would just return stale data if the origin server was down (more precisely, I've been testing with the server up but the listener process down so it gets 'connection refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if the origin server sends 'Cache-Control: must-revalidate' then squid will send a 504 Gateway Timeout error. Unfortunately, this timeout error does not get cached, and it gets sent upstream every time no matter what negative_ttl is set to. These squids are configured in a hierarchy where each feeds 4 others so loading gets spread out, but the fact that the error is not cached at all means that if the primary origin server is down, the squids near the top of the hierarchy will get hammered with hundreds of requests for the server that's down before every request that succeeds from the second server. Any suggestions? Is the fact that negative_ttl doesn't work with max_stale a bug, a missing feature, or an unfortunate interpretation of the HTTP 1.1 spec? By the way, I had hoped that 'Cache-Control: max-stale=0' would work the same as squid.conf's 'max_stale 0' but I never see an error come back when the origin server is down; it returns stale data instead. I wonder if that's intentional, a bug, or a missing feature. I also note that the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is stale) header attached if stale data is returned and I'm not seeing those. - Dave