Re: [squid-users] How get negative cache along with origin server error?

2008-10-07 Thread Dave Dykstra
On Sat, Oct 04, 2008 at 12:55:15PM -0400, Chris Nighswonger wrote:
 On Tue, Sep 30, 2008 at 6:13 PM, Dave Dykstra [EMAIL PROTECTED] wrote:
  On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote:
   I am running squid on over a thousand computers that are filtering data
   coming out of one of the particle collision detectors on the Large
   Hadron Collider.
 
 A bit off-topic here, but I'm wondering if these squids are being used
 in CERN's new computing grid? I noticed Fermi was helping out with
 this. 
 (http://devicedaily.com/misc/cern-launches-the-biggest-computing-grid-in-the-world.html)

The particular squids I was talking about are not considered to be part
of the grid, they're part of the High-Level Trigger filter farm that
is installed at the location of the CMS detector.  There are other
squids that are considered to be part of the grid, however, at each of
the locations around the world where CMS collision data is being
analyzed.  I own the piece of the software involved in moving detector
alignment  calibration data from CERN out to all the processors at all
the collaboration sites, which is needed to be able to understand the
collision data.  This data is on the order of 100MB but needs to get
sent to all the analysis jobs (and some of it changes every day or so),
unlike the collision data which is much larger but gets sent separately
to individual processors.  The software I own converts the data from a
database to http where it is cached in squids and then converts the data
from http to objects in memory.  The home page is frontier.cern.ch.

That article is misleading, by the way; the very nature of a computing
grid is that it doesn't belong to a single organization, so it's not
CERN's new computing grid.  It is a collaboration of many
organizations; many different organizations provide the computing
resources, and many different organizations provide the software that
controls the grid and the software that runs on the grid.

- Dave


Re: [squid-users] How get negative cache along with origin server error?

2008-10-07 Thread Dave Dykstra
Mark,

Thanks for that suggestion.  I had independently come to the same idea,
after posting my message, but haven't yet had a chance to try it out.  I
currently have hierarchies of cache_peer parents but stop the hierarchies
just before the last step to the origin servers because they were
selected by the host  port number in the URLs.  The origin servers have
their own squids configured in accelerator mode so I think I will just
extend the hierarchies all the way to them and let the squids (the ones
which were formerly the top of the hierarchies) take care of detecting
when an origin server goes down (using the cache_peer monitorurl
option).  I did a little experiment and found out that it doesn't matter
what the host and port number are in a URL if the top of a cache_peer
parent hierarchy is an accelerator mode squid, so I don't think I'll
even have to change the application.

- Dave


On Fri, Oct 03, 2008 at 11:21:19AM +1000, Mark Nottingham wrote:
 Have you considered setting squid up to know about both origins, so it  
 can fail over automatically?
 
 
 On 26/09/2008, at 5:04 AM, Dave Dykstra wrote:
 
 I am running squid on over a thousand computers that are filtering data
 coming out of one of the particle collision detectors on the Large
 Hadron Collider.  There are two origin servers, and the application
 layer is designed to try the second server if the local squid returns a
 5xx HTTP code (server error).  I just recently found that before squid
 2.7 this could never happen because squid would just return stale data
 if the origin server was down (more precisely, I've been testing with
 the server up but the listener process down so it gets 'connection
 refused').  In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
 the origin server sends 'Cache-Control: must-revalidate' then squid will
 send a 504 Gateway Timeout error.  Unfortunately, this timeout error
 does not get cached, and it gets sent upstream every time no matter what
 negative_ttl is set to.  These squids are configured in a hierarchy
 where each feeds 4 others so loading gets spread out, but the fact that
 the error is not cached at all means that if the primary origin server
 is down, the squids near the top of the hierarchy will get hammered with
 hundreds of requests for the server that's down before every request
 that succeeds from the second server.
 
 Any suggestions?  Is the fact that negative_ttl doesn't work with
 max_stale a bug, a missing feature, or an unfortunate interpretation of
 the HTTP 1.1 spec?
 
 By the way, I had hoped that 'Cache-Control: max-stale=0' would work the
 same as squid.conf's 'max_stale 0' but I never see an error come back
 when the origin server is down; it returns stale data instead.  I wonder
 if that's intentional, a bug, or a missing feature.  I also note that
 the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
 stale) header attached if stale data is returned and I'm not seeing
 those.
 
 - Dave


Re: [squid-users] How get negative cache along with origin server error?

2008-10-07 Thread Dave Dykstra
Henrik,

Thanks so much for your very informative reply!

On Thu, Oct 02, 2008 at 12:31:03PM +0200, Henrik Nordstrom wrote:
 By default Squid tries to use a parent 10 times before declaring it
 dead.

Ah, I never would have guessed that I needed to try 10 times before
negative_ttl would take effect for a dead host.  That wouldn't be
bad at all.

I just tried this now by having two squids, one a cache_peer parent of
the other.  I requested a URL while the origin server was up in order to
load the cache, with a CC max-age of 180.  Both squids have max_stale 0
and negative_ttl of 3 minutes.  Next, I put the origin server name as an
alias for localhost in /etc/hosts on both machines the squids were on,
so they both see connection refused when they try to connect to the
origin server.  I also restarted nscd and did squid -k reconfigure to
make sure the new host name was seen by squid.  After the (small) object
in the cache expired, I retried the request 20 times in a row.  Every
time I still saw the request get sent from the child squid to the parent
squid and return a 504 error.  This is unexpected to me; is it to you,
Henrik?  I would have thought the 504 error would get cached for three
minutes after the tenth try.

 Each time Squid retries a request it falls back on the next possible
 path for forwarding the request. What that is depends on your
 configuration. In normal forwarding without never_direct there usually
 never is more than at most two selected active paths: Selected peer if
 any + going direct. In accelerator mode or with never_direct more peers
 is selected as candidates (one sibling, and all possible parents).
 
 These retries happens on
 
 * 504 Gateway Timeout  (including local connection failure)
 * 502 Bad gateway
 
 or if retry_on_error is enabled also on
 
 * 401 Forbidden
 * 500 Server Error
 * 501 Not Implemented
 * 503 Service not available
 
 Please note that there is a slight name confusion relating to max-stale.
 Cache-Control: max-stale is not the same as the squid.conf directive. 
 
 Cache-Control: max-stale=N is a permissive request directive, saying
 that responses up to the given staleness is accepted as fresh without
 needing a cache validation. It's not defined for responses.
 
 The squid.conf setting is a restrictive directive, placing an upper
 limit on how stale content may be returned if cache validations fail.
 
 The Cache-Control: stale-if-error response header is equivalent the
 squid.conf max-stale setting, and overrides squid.conf.

That's very good to know.  I didn't see that in the HTTP 1.1 spec, but
I see that Mark Nottingham submitted a draft protocol extension with
this feature.

 The default for stale-if-error if not specified (and squid.conf
 max-stale) is infinite.
 
 Warning headers is not yet implemented by Squid. This is on the todo.

Sounds good.

- Dave


Re: [squid-users] How get negative cache along with origin server error?

2008-10-07 Thread Henrik Nordstrom
On tis, 2008-10-07 at 11:49 -0500, Dave Dykstra wrote:

 Ah, I never would have guessed that I needed to try 10 times before
 negative_ttl would take effect for a dead host.  That wouldn't be
 bad at all.

You don't. Squid does that for you automatically. 

 time I still saw the request get sent from the child squid to the parent
 squid and return a 504 error.  This is unexpected to me; is it to you,
 Henrik?  I would have thought the 504 error would get cached for three
 minutes after the tenth try.

Agreed.


  The Cache-Control: stale-if-error response header is equivalent the
  squid.conf max-stale setting, and overrides squid.conf.
 
 That's very good to know.  I didn't see that in the HTTP 1.1 spec, but
 I see that Mark Nottingham submitted a draft protocol extension with
 this feature.

Correct.

Regards
Henrik


signature.asc
Description: This is a digitally signed message part


Re: [squid-users] How get negative cache along with origin server error?

2008-10-07 Thread Dave Dykstra
On Tue, Oct 07, 2008 at 08:38:12PM +0200, Henrik Nordstrom wrote:
 On tis, 2008-10-07 at 11:49 -0500, Dave Dykstra wrote:
 
  Ah, I never would have guessed that I needed to try 10 times before
  negative_ttl would take effect for a dead host.  That wouldn't be
  bad at all.
 
 You don't. Squid does that for you automatically. 

I meant, in my testing I needed to try 10 times to see if negative_ttl
caching was working.  Or are you saying that squid tries to contact the
origin server 10 times on the first request before it even returns the
first 504?  I thought you meant it kept track of the number of client
attempts and should start caching it after 10 failures.

  time I still saw the request get sent from the child squid to the parent
  squid and return a 504 error.  This is unexpected to me; is it to you,
  Henrik?  I would have thought the 504 error would get cached for three
  minutes after the tenth try.
 
 Agreed.

Ok, then I will file a bugzilla report.

Meanwhile, I belive I have a workaround as I discussed in another post
on this thread
http://www.squid-cache.org/mail-archive/squid-users/200810/0171.html

Thanks,

- Dave


Re: [squid-users] How get negative cache along with origin server error?

2008-10-04 Thread Chris Nighswonger
Hi Dave,

On Tue, Sep 30, 2008 at 6:13 PM, Dave Dykstra [EMAIL PROTECTED] wrote:
 I found out a little bit more by looking in the source code and the
 generated headers and setting a few breakpoints.  The squid closest to
 the origin server that is down (the one at the top of the cache_peer
 parent hierarchy) never attempts to store the negative result.  Worse,
 it sets an Expires: header that is equal to the current time.  Squids
 further down the hierarchy do call storeNegativeCache() but they see
 an expiration time that is already past so it isn't of any use.

 Those things make it seem like squid is far from being able to
 effectively handle failing over from one origin server to another
 at the application level.

 - Dave

 On Tue, Sep 30, 2008 at 10:32:43AM -0500, Dave Dykstra wrote:
 Do any of the squid experts have any answers for this?

 - Dave

 On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote:
  I am running squid on over a thousand computers that are filtering data
  coming out of one of the particle collision detectors on the Large
  Hadron Collider.

A bit off-topic here, but I'm wondering if these squids are being used
in CERN's new computing grid? I noticed Fermi was helping out with
this. 
(http://devicedaily.com/misc/cern-launches-the-biggest-computing-grid-in-the-world.html)

Regards,
Chris


Re: [squid-users] How get negative cache along with origin server error?

2008-10-02 Thread Henrik Nordstrom
By default Squid tries to use a parent 10 times before declaring it
dead.

Each time Squid retries a request it falls back on the next possible
path for forwarding the request. What that is depends on your
configuration. In normal forwarding without never_direct there usually
never is more than at most two selected active paths: Selected peer if
any + going direct. In accelerator mode or with never_direct more peers
is selected as candidates (one sibling, and all possible parents).

These retries happens on

* 504 Gateway Timeout  (including local connection failure)
* 502 Bad gateway

or if retry_on_error is enabled also on

* 401 Forbidden
* 500 Server Error
* 501 Not Implemented
* 503 Service not available

Please note that there is a slight name confusion relating to max-stale.
Cache-Control: max-stale is not the same as the squid.conf directive. 

Cache-Control: max-stale=N is a permissive request directive, saying
that responses up to the given staleness is accepted as fresh without
needing a cache validation. It's not defined for responses.

The squid.conf setting is a restrictive directive, placing an upper
limit on how stale content may be returned if cache validations fail.

The Cache-Control: stale-if-error response header is equivalent the
squid.conf max-stale setting, and overrides squid.conf.

The default for stale-if-error if not specified (and squid.conf
max-stale) is infinite.


Warning headers is not yet implemented by Squid. This is on the todo.

Regards
Henrik

On tis, 2008-09-30 at 10:32 -0500, Dave Dykstra wrote:
 Do any of the squid experts have any answers for this?
 
 - Dave
 
 On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote:
  I am running squid on over a thousand computers that are filtering data
  coming out of one of the particle collision detectors on the Large
  Hadron Collider.  There are two origin servers, and the application
  layer is designed to try the second server if the local squid returns a
  5xx HTTP code (server error).  I just recently found that before squid
  2.7 this could never happen because squid would just return stale data
  if the origin server was down (more precisely, I've been testing with
  the server up but the listener process down so it gets 'connection
  refused').  In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
  the origin server sends 'Cache-Control: must-revalidate' then squid will
  send a 504 Gateway Timeout error.  Unfortunately, this timeout error
  does not get cached, and it gets sent upstream every time no matter what
  negative_ttl is set to.  These squids are configured in a hierarchy
  where each feeds 4 others so loading gets spread out, but the fact that
  the error is not cached at all means that if the primary origin server
  is down, the squids near the top of the hierarchy will get hammered with
  hundreds of requests for the server that's down before every request
  that succeeds from the second server.
  
  Any suggestions?  Is the fact that negative_ttl doesn't work with
  max_stale a bug, a missing feature, or an unfortunate interpretation of
  the HTTP 1.1 spec?
  
  By the way, I had hoped that 'Cache-Control: max-stale=0' would work the
  same as squid.conf's 'max_stale 0' but I never see an error come back
  when the origin server is down; it returns stale data instead.  I wonder
  if that's intentional, a bug, or a missing feature.  I also note that
  the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
  stale) header attached if stale data is returned and I'm not seeing
  those.
  
  - Dave


signature.asc
Description: This is a digitally signed message part


Re: [squid-users] How get negative cache along with origin server error?

2008-10-02 Thread Mark Nottingham
Have you considered setting squid up to know about both origins, so it  
can fail over automatically?



On 26/09/2008, at 5:04 AM, Dave Dykstra wrote:

I am running squid on over a thousand computers that are filtering  
data

coming out of one of the particle collision detectors on the Large
Hadron Collider.  There are two origin servers, and the application
layer is designed to try the second server if the local squid  
returns a

5xx HTTP code (server error).  I just recently found that before squid
2.7 this could never happen because squid would just return stale data
if the origin server was down (more precisely, I've been testing with
the server up but the listener process down so it gets 'connection
refused').  In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
the origin server sends 'Cache-Control: must-revalidate' then squid  
will

send a 504 Gateway Timeout error.  Unfortunately, this timeout error
does not get cached, and it gets sent upstream every time no matter  
what

negative_ttl is set to.  These squids are configured in a hierarchy
where each feeds 4 others so loading gets spread out, but the fact  
that

the error is not cached at all means that if the primary origin server
is down, the squids near the top of the hierarchy will get hammered  
with

hundreds of requests for the server that's down before every request
that succeeds from the second server.

Any suggestions?  Is the fact that negative_ttl doesn't work with
max_stale a bug, a missing feature, or an unfortunate interpretation  
of

the HTTP 1.1 spec?

By the way, I had hoped that 'Cache-Control: max-stale=0' would work  
the

same as squid.conf's 'max_stale 0' but I never see an error come back
when the origin server is down; it returns stale data instead.  I  
wonder

if that's intentional, a bug, or a missing feature.  I also note that
the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
stale) header attached if stale data is returned and I'm not seeing
those.

- Dave


--
Mark Nottingham   [EMAIL PROTECTED]




Re: [squid-users] How get negative cache along with origin server error?

2008-09-30 Thread Dave Dykstra
Do any of the squid experts have any answers for this?

- Dave

On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote:
 I am running squid on over a thousand computers that are filtering data
 coming out of one of the particle collision detectors on the Large
 Hadron Collider.  There are two origin servers, and the application
 layer is designed to try the second server if the local squid returns a
 5xx HTTP code (server error).  I just recently found that before squid
 2.7 this could never happen because squid would just return stale data
 if the origin server was down (more precisely, I've been testing with
 the server up but the listener process down so it gets 'connection
 refused').  In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
 the origin server sends 'Cache-Control: must-revalidate' then squid will
 send a 504 Gateway Timeout error.  Unfortunately, this timeout error
 does not get cached, and it gets sent upstream every time no matter what
 negative_ttl is set to.  These squids are configured in a hierarchy
 where each feeds 4 others so loading gets spread out, but the fact that
 the error is not cached at all means that if the primary origin server
 is down, the squids near the top of the hierarchy will get hammered with
 hundreds of requests for the server that's down before every request
 that succeeds from the second server.
 
 Any suggestions?  Is the fact that negative_ttl doesn't work with
 max_stale a bug, a missing feature, or an unfortunate interpretation of
 the HTTP 1.1 spec?
 
 By the way, I had hoped that 'Cache-Control: max-stale=0' would work the
 same as squid.conf's 'max_stale 0' but I never see an error come back
 when the origin server is down; it returns stale data instead.  I wonder
 if that's intentional, a bug, or a missing feature.  I also note that
 the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
 stale) header attached if stale data is returned and I'm not seeing
 those.
 
 - Dave


Re: [squid-users] How get negative cache along with origin server error?

2008-09-30 Thread Dave Dykstra
I found out a little bit more by looking in the source code and the
generated headers and setting a few breakpoints.  The squid closest to
the origin server that is down (the one at the top of the cache_peer
parent hierarchy) never attempts to store the negative result.  Worse,
it sets an Expires: header that is equal to the current time.  Squids
further down the hierarchy do call storeNegativeCache() but they see
an expiration time that is already past so it isn't of any use.

Those things make it seem like squid is far from being able to
effectively handle failing over from one origin server to another
at the application level.

- Dave

On Tue, Sep 30, 2008 at 10:32:43AM -0500, Dave Dykstra wrote:
 Do any of the squid experts have any answers for this?
 
 - Dave
 
 On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote:
  I am running squid on over a thousand computers that are filtering data
  coming out of one of the particle collision detectors on the Large
  Hadron Collider.  There are two origin servers, and the application
  layer is designed to try the second server if the local squid returns a
  5xx HTTP code (server error).  I just recently found that before squid
  2.7 this could never happen because squid would just return stale data
  if the origin server was down (more precisely, I've been testing with
  the server up but the listener process down so it gets 'connection
  refused').  In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
  the origin server sends 'Cache-Control: must-revalidate' then squid will
  send a 504 Gateway Timeout error.  Unfortunately, this timeout error
  does not get cached, and it gets sent upstream every time no matter what
  negative_ttl is set to.  These squids are configured in a hierarchy
  where each feeds 4 others so loading gets spread out, but the fact that
  the error is not cached at all means that if the primary origin server
  is down, the squids near the top of the hierarchy will get hammered with
  hundreds of requests for the server that's down before every request
  that succeeds from the second server.
  
  Any suggestions?  Is the fact that negative_ttl doesn't work with
  max_stale a bug, a missing feature, or an unfortunate interpretation of
  the HTTP 1.1 spec?
  
  By the way, I had hoped that 'Cache-Control: max-stale=0' would work the
  same as squid.conf's 'max_stale 0' but I never see an error come back
  when the origin server is down; it returns stale data instead.  I wonder
  if that's intentional, a bug, or a missing feature.  I also note that
  the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
  stale) header attached if stale data is returned and I'm not seeing
  those.
  
  - Dave


[squid-users] How get negative cache along with origin server error?

2008-09-25 Thread Dave Dykstra
I am running squid on over a thousand computers that are filtering data
coming out of one of the particle collision detectors on the Large
Hadron Collider.  There are two origin servers, and the application
layer is designed to try the second server if the local squid returns a
5xx HTTP code (server error).  I just recently found that before squid
2.7 this could never happen because squid would just return stale data
if the origin server was down (more precisely, I've been testing with
the server up but the listener process down so it gets 'connection
refused').  In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
the origin server sends 'Cache-Control: must-revalidate' then squid will
send a 504 Gateway Timeout error.  Unfortunately, this timeout error
does not get cached, and it gets sent upstream every time no matter what
negative_ttl is set to.  These squids are configured in a hierarchy
where each feeds 4 others so loading gets spread out, but the fact that
the error is not cached at all means that if the primary origin server
is down, the squids near the top of the hierarchy will get hammered with
hundreds of requests for the server that's down before every request
that succeeds from the second server.

Any suggestions?  Is the fact that negative_ttl doesn't work with
max_stale a bug, a missing feature, or an unfortunate interpretation of
the HTTP 1.1 spec?

By the way, I had hoped that 'Cache-Control: max-stale=0' would work the
same as squid.conf's 'max_stale 0' but I never see an error come back
when the origin server is down; it returns stale data instead.  I wonder
if that's intentional, a bug, or a missing feature.  I also note that
the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
stale) header attached if stale data is returned and I'm not seeing
those.

- Dave