Re: store url rewriter stuff

2007-11-14 Thread Adrian Chadd
On Thu, Nov 15, 2007, Adrian Chadd wrote:
> I've started work on the store url rewriter stuff in Squid-HEAD.

Its now spitting URLs to the redirector and reading them back. Its not yet
using this in the request_t.

The redirector I'm currently using:

#!/usr/bin/perl -w

$| = 1;

while (<>) {
chomp;
# print STDERR $_ . "\n";
if (m/kh(.*?)\.google\.com(.*?)\/(.*?) /) {
print "http://keyhole-srv.google.com"; . $2 . ".SQUIDINTERNAL/" 
. $3 . "\n";
# print STDERR "KEYHOLE\n";
} elsif (m/mt(.*?)\.google\.com(.*?)\/(.*?) /) {
print "http://map-srv.google.com"; . $2 . ".SQUIDINTERNAL/" . $3 
. "\n";
# print STDERR "MAPSRV\n";
} else {
print $_ . "\n";
}
}

Example rewritten URL:

2007/11/15 14:46:27| clientStoreURLRewriteDone: 
'http://kh2.google.com.au/kh?n=404&v=22&t=tqtss' 
result=http://keyhole-srv.google.com.au.SQUIDINTERNAL/kh?n=404&v=22&t=tqtss
2007/11/15 14:46:27| clientStoreURLRewriteDone: 
'http://kh3.google.com.au/kh?n=404&v=22&t=ttqrr' 
result=http://keyhole-srv.google.com.au.SQUIDINTERNAL/kh?n=404&v=22&t=ttqrr
2007/11/15 14:46:27| clientStoreURLRewriteDone: 
'http://mt2.google.com/mt?n=404&v=w2t.61&x=3&y=7&zoom=13' 
result=http://map-srv.google.com.SQUIDINTERNAL/mt?n=404&v=w2t.61&x=3&y=7&zoom=13
2007/11/15 14:46:27| clientStoreURLRewriteDone: 
'http://mt3.google.com/mt?n=404&v=w2t.61&x=3&y=8&zoom=13' 
result=http://map-srv.google.com.SQUIDINTERNAL/mt?n=404&v=w2t.61&x=3&y=8&zoom=13

Comments are welcome.




Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Duane Wessels




On Thu, 15 Nov 2007, Adrian Chadd wrote:


I'd like to see something default in the next Squid release, so we can
release it with a few interesting tag lines like "Can cache google maps!"


I can support removing '?' from the default QUERY acl definition.

I cannot support adding default 'rep_header' ACL types.

Duane W.


store url rewriter stuff

2007-11-14 Thread Adrian Chadd
I've started work on the store url rewriter stuff in Squid-HEAD.
The eventual aim is to be able to use a "canonicalised" URL form for store
lookups, working around the tricks used by CDNs and other websites to
distribute content and improve parallelism. Think google maps, youtube and
microsoft updates caching.

The work done in Squid-2-HEAD so far is just laying groundwork and slightly
reshuffling store_client.c code around - shifting the redirector code out into
an external file - as preparation for further work.

The current work shouldn't affect normal squid-2.HEAD behaviour as its not
right now passing anything to the storeurl rewriters, nor is it using the
request_t->storeurl string in any cache lookups. There's the possibility
that I break a lot of stuff in the store key manipulations so that will
be the very last thing I implement and only after plenty of other testing.

(Note: This work isn't being sponsored by anyone; I'm just fed up that
its not implemented..)


Thanks,



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Adrian Chadd
On Wed, Nov 14, 2007, Duane Wessels wrote:

>   #We recommend you to use the following two lines.
>   acl QUERY urlpath_regex cgi-bin \?
>   cache deny QUERY
> 
> Thats what I'm supporting/suggesting: remove the default 'cache deny'
> lines and add some default refresh_pattern lines.

I've removed those and I've added the refresh_pattern lines.
Is that "enough" or am I missing some subtlety?

Google maps currently handles "if modified since" requests a bit wrong
(it returns the entire reply body regardless of the IMS date.)





Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Adrian Chadd
On Thu, Nov 15, 2007, Henrik Nordstrom wrote:

> And what is "from an HTTP/1.0 server"? The next hop, or do it refer to
> the protocol version of the origin server (which can't be deduced that
> easily).

Ok. What about that and the Cache-Control: max-age=0 in the forwarded
requests?



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Adrian Chadd
On Thu, Nov 15, 2007, Henrik Nordstrom wrote:

> > Will what I've done above actually stop storing the data entirely, or will
> > it try revalidating it every request? Is there really a difference?
> 
> It will stop caching, at lest unless there is an ETag or Last-Modified.

Ok good!

> Note: The RFC do not forbid caching, only to consider the response fresh
> without explicit expiry time or validation. So it's fine.

> However, there is still the HTTP/1.0 "MUST NOT cache" requirement. Not
> really an idea what that's about however.

If people -are- returning freshness info in a ? URL then its entirely
possible they've got clue, right? Or not?




Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Henrik Nordstrom
On tor, 2007-11-15 at 11:49 +0900, Adrian Chadd wrote:

> > However, there is still the HTTP/1.0 "MUST NOT cache" requirement. Not
> > really an idea what that's about however.
> 
> If people -are- returning freshness info in a ? URL then its entirely
> possible they've got clue, right? Or not?

Exactly.

And what is "from an HTTP/1.0 server"? The next hop, or do it refer to
the protocol version of the origin server (which can't be deduced that
easily).

Regards
Henrik


signature.asc
Description: This is a digitally signed message part


Re: caching "dynamic" content

2007-11-14 Thread Henrik Nordstrom
On tor, 2007-11-15 at 10:28 +0900, Adrian Chadd wrote:

> Will what I've done above actually stop storing the data entirely, or will
> it try revalidating it every request? Is there really a difference?

It will stop caching, at lest unless there is an ETag or Last-Modified.

Note: The RFC do not forbid caching, only to consider the response fresh
without explicit expiry time or validation. So it's fine.

However, there is still the HTTP/1.0 "MUST NOT cache" requirement. Not
really an idea what that's about however.

Regards
Henrik


signature.asc
Description: This is a digitally signed message part


Re: caching "dynamic" content

2007-11-14 Thread Henrik Nordstrom
On ons, 2007-11-14 at 18:20 -0700, Duane Wessels wrote:

> While we're at it we could probably also get rid of the silly gopher
> refresh_pattern line.

There is still gophers around, and it's a supported protocol. Without
that line gopher responses won't get cached as they do not have any
freshness information but is still considered more or less static..

Regards
Henrik


signature.asc
Description: This is a digitally signed message part


Re: caching "dynamic" content

2007-11-14 Thread Henrik Nordstrom
On tor, 2007-11-15 at 10:17 +0900, Adrian Chadd wrote:
> On Wed, Nov 14, 2007, Duane Wessels wrote:
> 
> > >I'd like to see something default in the next Squid release, so we can
> > >release it with a few interesting tag lines like "Can cache google maps!"
> > 
> > I can support removing '?' from the default QUERY acl definition.
> > 
> > I cannot support adding default 'rep_header' ACL types.
> 
> What about default refresh_pattern to not cache cgi-bin and/or ? URLs?

The default refresh_pattern is already good, if not considering the
silly fact that Last-Modified heuristics is not allowed here by the
RFC...

Regards
Henrik


signature.asc
Description: This is a digitally signed message part


Intruduction of myself

2007-11-14 Thread sejda
Hi, 
my name is Radek Sečka (in emails I rather write Secka, without diacritics).
I have been working in IT (specialy writing programs) for 7 years. I program 
mostly on Windows.net, but I have no problem with other platforms due to lot of 
my expiriences.
I am 25 years old. Having no time, living with grilfriend, looking for family 
and happiness.

Nowdays I study Programming on Faculty of mathematics and physics on Charles 
Univerzity in Prague in Czech Republic (EU).
And thats why I met Squid project.

I choose implemention of renderer into squid as my graduee work. This renderer 
should format pages for devices with smaller display like pocket pc, mobiles, 
TVs etc. We think that it is much cheaper to install renderer than make separe 
version of web for these devices (on technology like wap or something).

I have already read all your documentation that you have on squid web. I have 
been looking into squid source codes since last year. So I could modify sources 
on my own but I think i will be the best to know your opinion.

With best wishes
Radek Secka



Re: caching "dynamic" content

2007-11-14 Thread Adrian Chadd
On Wed, Nov 14, 2007, Duane Wessels wrote:

> >What about default refresh_pattern to not cache cgi-bin and/or ? URLs?
> 
> I assume you mean to always refresh (validate) cgi-bin and/or ?

Ideally I'd like to cache cgi-bin / ? content if cache information is
given (max-age, Expires, etc; henrik knows more about the options than
I!) and not cache the rest.

I'm not sure my current refresh patterns handle this:

refresh_pattern ^ftp:   144020% 10080
refresh_pattern ^gopher:14400%  1440
refresh_pattern cgi-bin 0   0%  0
refresh_pattern \?  0   0%  0
refresh_pattern .   0   20% 4320

> Because if you don't want them to be cached then the 'cache' access
> list is the place to do that.
> 
> yes, I could support default refresh_pattern lines for ? and cgi-bin,
> and then remove the default 'cache' rules I suppose.

Will what I've done above actually stop storing the data entirely, or will
it try revalidating it every request? Is there really a difference?

> While we're at it we could probably also get rid of the silly gopher
> refresh_pattern line.

:)




Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Henrik Nordstrom
On tor, 2007-11-15 at 09:37 +0900, Adrian Chadd wrote:
> G'day,
> 
> I'd like to propose a Squid modification - to cache "dynamic" content
> thats playing "good".

Yes, it's as simple as removing the cache lines from the default
suggested config and making sure your refresh_pattern rules is
reasonable. The actual default is none.

The reason we block query URls is just RFC compliance in case someone
uses refresh_pattern with a min age > 0, and legacy from those lines
always been there..

If addressing this I think it's better done in refresh_pattern than
no_cache. Additionally refresh_pattern is in serious need of a cleanup
as it's been too heavily overloaded (and more is coming..).

What the RFC says in 13.9 Side Effects of GET and HEAD:

   caches MUST NOT treat responses to such URIs as fresh unless
   the server provides an explicit expiration time. This specifically
   means that responses from HTTP/1.0 servers for such URIs SHOULD NOT
   be taken from a cache.

Note: Explicit expiry time is Expires or Cache-Control max-age/s-maxage.
Last-Modified is not. But I'd argue that the fact that last-modified
based heuristics is not allowed is just an oversight and not
intentional.

Regards
Henrik


signature.asc
Description: This is a digitally signed message part


Re: caching "dynamic" content

2007-11-14 Thread Duane Wessels




On Thu, 15 Nov 2007, Adrian Chadd wrote:


Ideally I'd like to cache cgi-bin / ? content if cache information is
given (max-age, Expires, etc; henrik knows more about the options than



right.



I'm not sure my current refresh patterns handle this:

refresh_pattern ^ftp:   144020% 10080
refresh_pattern ^gopher:14400%  1440
refresh_pattern cgi-bin 0   0%  0
refresh_pattern \?  0   0%  0
refresh_pattern .   0   20% 4320


You also have to remove these:

#We recommend you to use the following two lines.
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY

Thats what I'm supporting/suggesting: remove the default 'cache deny'
lines and add some default refresh_pattern lines.

Duane W.


Re: caching "dynamic" content

2007-11-14 Thread Duane Wessels




On Thu, 15 Nov 2007, Adrian Chadd wrote:


What about default refresh_pattern to not cache cgi-bin and/or ? URLs?


I assume you mean to always refresh (validate) cgi-bin and/or ?

Because if you don't want them to be cached then the 'cache' access
list is the place to do that.

yes, I could support default refresh_pattern lines for ? and cgi-bin,
and then remove the default 'cache' rules I suppose.

While we're at it we could probably also get rid of the silly gopher
refresh_pattern line.

Duane W.


Re: caching "dynamic" content

2007-11-14 Thread Adrian Chadd
On Wed, Nov 14, 2007, Duane Wessels wrote:

> >I'd like to see something default in the next Squid release, so we can
> >release it with a few interesting tag lines like "Can cache google maps!"
> 
> I can support removing '?' from the default QUERY acl definition.
> 
> I cannot support adding default 'rep_header' ACL types.

What about default refresh_pattern to not cache cgi-bin and/or ? URLs?



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Adrian Chadd
On Thu, Nov 15, 2007, Robert Collins wrote:

> > I'd like to propose a Squid modification - to cache "dynamic" content
> > thats playing "good".
> 
> I long ago turned off the ? cache deny rule on my caches.
> 
> That said, making reply header checks be able to influence cache rules
> sounds like a good idea.

I'd like to see something default in the next Squid release, so we can
release it with a few interesting tag lines like "Can cache google maps!"





Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


caching "dynamic" content

2007-11-14 Thread Adrian Chadd
G'day,

I'd like to propose a Squid modification - to cache "dynamic" content
thats playing "good".

An example is from Google maps, who are now actually returning Expires:
headers:

violet:~/work/squid/squid-2-HEAD/tools adrian$ ./squidclient -m HEAD 
'http://mt0.google.com/mt?n=404&v=w2.61&x=14&y=10&zoom=13'
HTTP/1.0 200 OK
Content-Type: image/png
Expires: Fri, 14 Nov 2008 00:01:05 GMT
Last-Modified: Fri, 17 Dec 2004 04:58:08 GMT
Server: Keyhole Server 2.4
Content-Length: 2415
Date: Thu, 15 Nov 2007 00:01:05 GMT
Age: 252
X-Cache: HIT from violet.local
Via: 1.0 violet.local:3128 (squid/2.HEAD-CVS)
Proxy-Connection: close

I can't use the rep_header ACL with 'cache' as that particular lookup is done
(AFAICT) before the reply is handled, so this silently fails:

acl HaveExpiresRepHdr rep_header Expires ^[A-Z]
acl HaveLastModifiedRepHdr rep_header Last-Modified ^[A-Z]

cache allow QUERY HaveExpiresRepHdr
cache allow QUERY HaveLastModifiedRepHdr
cache deny QUERY

Any ideas? (Henrik?)




Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: caching "dynamic" content

2007-11-14 Thread Robert Collins
On Thu, 2007-11-15 at 09:37 +0900, Adrian Chadd wrote:
> G'day,
> 
> I'd like to propose a Squid modification - to cache "dynamic" content
> thats playing "good".

I long ago turned off the ? cache deny rule on my caches.

That said, making reply header checks be able to influence cache rules
sounds like a good idea.

-Rob
-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


md5 stuff under MacOSX

2007-11-14 Thread Adrian Chadd
Guys,

The md5 code hacks in squid-2.HEAD doesn't even compile under macos/x because
of a lack of:

* sys/types.h include before including the system/openssl md5 header;
* defining MD5_DIGEST_CHARS or whatever that define is meant to be

Why is this so hard? :)



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: [squid-users] Solaris/OpenSSL/MD5 Issues

2007-11-14 Thread Adrian Chadd
On Wed, Nov 14, 2007, Amos Jeffries wrote:

> The bigger problem which you have just uncovered is that FreeBSD does
> provide a sys/md5.h, but does not define the MD5_DIGEST_CHARS or
> MD5_DIGEST_
> For the fix of FreeBSD. It should just be a little tweak of the #define for

Why don't we just force use of the openssl library for md5?
Will that bloat the Squid binary somehow?


-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -