Re: [squid-users] Squid Scalability

2009-04-17 Thread Gavin McCullagh
Hi,

On Sun, 05 Apr 2009, Gavin McCullagh wrote:

> Here's our current situation:
> 
> 
> Version: 2.6.STABLE18 (Ubuntu Hardy Package)
> OS: 32-Bit Ubuntu GNU/Linux (Hardy)
> CPU: Dual Core Intel(R) Xeon(R) CPU  3050  @ 2.13GHz
> RAM: 8GB
> HDD: 2x SATA disks (150GB, 1TB)
> Cache: 1x 600GB
> Users: ~3000
> RPS: 130
> Hit Ratio: 35-40%
> Byte Hit Ratio: ~13%

On re-reading this whole page, I realise how to estimate the number of
users.  I've started graphing the number of cache clients and it looks like
1200 is a better guess.

Gavin



Re: [squid-users] Squid Scalability

2009-04-18 Thread Gavin McCullagh
Hi,

On Sat, 18 Apr 2009, Nyamul Hassan wrote:

> Maximum number of users is not a very good indicator of measuring squid  
> performance.  I think, it makes more sense on finding out the maximum  
> req/sec that a box can handle, keeping the service timers within 
> reasonable limits.

I agree, but I just wanted to make sure I was quoting correct figures.  I
have quoted the requests per second we serve, though that's probably not an
absolute maximum.

> And, looking a your stats once again, I think you need to upgrade to 
> 64-bit of your OS to properly use the full 8GB RAM.

Absolutely, but I'd need an upgrade to 64-bit hardware for that too :-)

Hopefully, the OS gains a little from the extra RAM available for caching,
etc, though I doubt it gains much to be honest.

Gavin



Re: [squid-users] reverse proxy filtering?

2009-04-19 Thread Gavin McCullagh
Hi,

On Sun, 19 Apr 2009, Jeff Sadowski wrote:

> I am helping a library to setup a way to display available books to the 
> outside.
> The internal website allows you to login and check out books which
> they want blocked to the outside. They do not want to modify the web
> developers code to fit their special needs, since it is a commonly
> used program to the libraries. They just want me to stop people from
> logging in and checking out books and they don't need it to be an
> absolute just difficult. When they should only be allowed to check
> books out from inside. 

I presume the login is required to do any task.

It might be simplest to just block access to any URLs which process a
check out and any other disallowed tasks?  You could give a custom error
page which says "this task is not allowed to external users. I suppose it's
better for users to not show buttons which they can't use, but this would
be simple to implement, perform well and wouldn't require modifying html.

Some people do modify content indirectly using squid's url_rewrite,
including this amusing one:
http://www.ex-parrot.com/~pete/upside-down-ternet.html 

which involves running a webserver on squid.  The perl script downloads the
page to squid's web directory, translates it and rewrites the url to the
localhost location of the translated page.  It's a bit of a hack, but it
would probably work.

Gavin



Re: [squid-users] reverse proxy filtering?

2009-04-19 Thread Gavin McCullagh
Hi,

On Sun, 19 Apr 2009, Jeff Sadowski wrote:

> Actually no you can browse books without login in.

Why not just prevent logins then by having squid block the login processing
page with a custom error page stating "no logins from outside"?

> Cool thanks but I'm seriously looking at using privoxy and maybe even
> privoxy and squid together
> because it appears privoxy makes a terrible reverse proxy and would
> leave my proxy box open for others to download illegal content. So my
> current plan is to run privoxy on some random port and point the
> reverse proxy to that port and wala both inline editing via privoxy
> with a simple search replace string and no other sites except the one
> specified for the reverse proxy via squid.

Your call of course, but it seems like you're over-complicating life.  The
more links you have in the chain (squid, privoxy, ...) and the more complex
your setup, the more things can go wrong over the lifetime of the system.
For sure modifying the page content will be slower, but if you don't have
lots of users that may not matter.

Another thing to bear in mind is that upgrades to the web-based system may
well break either setup -- the URLs might change so your url blocking might
fail or the page content might change breaking your regular expressions.
In principal, a system which only _allowed_ certain URLs and blocked all
others would be more robust than blocking certain URLs, failing closed
rather than open.

Gavin



[squid-users] is there a squid "cache rank" value available for statistics?

2009-04-19 Thread Gavin McCullagh
Hi,

I'm wondering about ways to measure the optimum size for a cache, in terms
of the "value" you gain from each GB of cache space.  If you've got a 400GB
cache and only 99% of your hits come from the first 350GB, there's probably
no point looking for a larger cache.  If only 80% come from the first
350GB, then a bigger cache might well be useful.

I realise there are rules of thumb for cache size, it would be interesting
to be able to analyse a particular squid installation.

Squid obviously removes objects from its cache based on the chosen
cache_replacement_policy.  It appears from the comments in squid.conf that
in the case of the LRU policy, this is implemented as a list, presumably a
queue of pointers to objects in the cache.  Objects which come to the head
of the queue are presumably next for removal.  I guess if an object in the
cache gets used it goes back to the tail of the queue.   I suppose this
process must involve linearly traversing the queue to find the object and
remove it, which is presumably why heap-based policies are available.

I wonder if it would be feasible to calculate a "cache rank", which
indicates the position an object was within the queue at the time of the
hit.  So, perhaps 0 means at the tail of the queue, 1 means at the head.
If this could be reported alongside each hit in the access.log, one could
draw stats on the amount of hits served by each portion of the queue and
therefore determine the value of expanding or contracting your cache.

In the case of simple LRU, if the queue must be traversed to find each
element and requeue it (perhaps this isn't the case?), I suppose one could
count the position in the queue and divide by the total length.  

With a heap, things are more complex.  I guess you could give an indication
of the depth in the heap but there would be so many objects on the lowest
levels, I don't suppose this would be a great guide.  Is there some better
value available, such as the key used in the heap maybe?

Or perhaps the whole idea is flawed somehow?

Comments, criticisms, explanations, rebukes all welcome.
Gavin




Re: [squid-users] is there a squid "cache rank" value available for statistics?

2009-04-19 Thread Gavin McCullagh
On Sun, 19 Apr 2009, Gavin McCullagh wrote:

> In the case of simple LRU, if the queue must be traversed to find each
> element and requeue it (perhaps this isn't the case?), 

On reflection, I presume this is not the case.  I imagine the struct in ram
for each cache object must include a pointer to prev and next.  That makes
me wonder how "heap lru" improves matters.  I guess I need to go and read the
paper referenced in the notes :-)

Gavin



Re: [squid-users] is there a squid "cache rank" value available for statistics?

2009-04-20 Thread Gavin McCullagh
Hi,

On Mon, 20 Apr 2009, Amos Jeffries wrote:

> Squid suffers from a little bit of an anochronism in the way it stores
> object. the classic ufs systems essentially use round-robin and hash to
> determine storage location for each object separately. This works wonders
> on ensuring no clashes, but not so good for retrieval optimization.
> Adrian Chadd has done a lot of study and some work in this area
> particularly for Squid-2.6/2.7. His paper for FreeBSD conference is a good
> read on how disk storage relates to Squid.
> http://www.squid-cache.org/~adrian/talks/20081007%20-%20NYCBSDCON%20-%20Disk%20IO.pdf

Thanks, I'll take a look.

> > I realise there are rules of thumb for cache size, it would be interesting
> > to be able to analyse a particular squid installation.
> 
> Feel free. We would be interested in any improvements you can come up with.

I didn't say anything about being able to improve anything! ;-)

I was hoping if a given user could plot a report of %age of hits coming
from each successive GB of the cache (or a rough guide to that), they could
more easily work out, based on their own workload, what the best value
cache size would be.  Obviously I don't mean physical disk location, I mean
"if my cache were eg halved in size, how many hits would I lose?".

I was naively hoping that the code might easily be leveraged to capture
this "cache rank" and log it for each hit, but the more I think about it,
the less likely this seems.

> IIRC there is a doubly-linked list with tail pointer for LRU.

That was my guess alright, so requeueing is O(1).  I presume traversal is
not necessary for a HIT though, which means the position in the removal
queue may not be easy to determine.

Looking over the HP paper, heap LRU seems to have been created so that the
comparison between LRU, GSDF and LFUDA could be more easily made.  I'm
guessing LRU is as quick (and simpler) as a list than as a heap.

> > In the case of simple LRU, if the queue must be traversed to find each
> > element and requeue it (perhaps this isn't the case?), I suppose one could
> > count the position in the queue and divide by the total length.
> 
> Yes, same big problems with that in LRU as displaying all objects in the
> cache ( >1 million is not uncommon cache sizes) and regex purges.

Definitely not something you want to do often :-)

> > With a heap, things are more complex.  I guess you could give an
> > indication
> > of the depth in the heap but there would be so many objects on the lowest
> > levels, I don't suppose this would be a great guide.  Is there some better
> > value available, such as the key used in the heap maybe?
> 
> There is fileno or hashed value rather than URL. You still have the same
> issues of traversal though.

Hmmm.  I guess I need to read up a little more on how heap LFUDA is
implemented.

> If you want to investigate. I'll gently nudge you towards Squid-3 where
> the rest of the development is going on and improvements have the best
> chance of survival.
>
> For further discussion you may want to bring this up in squid-dev where
> the developers hang out.

Yeah, fair points.  Thanks.

Gavin



Re: [squid-users] is there a squid "cache rank" value available for statistics?

2009-04-20 Thread Gavin McCullagh
On Mon, 20 Apr 2009, Amos Jeffries wrote:
> Gavin McCullagh wrote:
>> Obviously I don't mean physical disk location, I mean
>> "if my cache were eg halved in size, how many hits would I lose?".
>
> Ah, in my experience that is gained from long term monitoring of the hit  
> rates and tweaking.
> For example on the wiki cache we had a small outage with the disk and  
> flipped it over to RAM-only for a few weeks. The munin graphs showed a  
> ~20% reduction in byte-hit ratio and ~15% drop in request-hit ratio. On  
> just 2 req/sec.

I guess that'll work alright, though it's a little inconvenient.

>> That was my guess alright, so requeueing is O(1).  I presume traversal is
>> not necessary for a HIT though, which means the position in the removal
>> queue may not be easy to determine.
>
> No thats hashed, and the HIST gets immediately cut out and pasted at the  
> start. So still around O(1) or similar for the list actions.

Right.  

As always, many thanks for the explanations,

Gavin



Re: [squid-users] Re: is there a squid "cache rank" value available for statistics?

2009-04-23 Thread Gavin McCullagh
Hi,

On Mon, 20 Apr 2009, RW wrote:

> On Sun, 19 Apr 2009 19:03:36 +0100
> Gavin McCullagh  wrote:
> 
> > With a heap, things are more complex.  I guess you could give an
> > indication of the depth in the heap but there would be so many
> > objects on the lowest levels, I don't suppose this would be a great
> > guide.  Is there some better value available, such as the key used in
> > the heap maybe?

> You could probably do this with a modified version of heap LRU, using a
> counter rather than a timestamp as a key. You could then work-out the
> relative position in the queue, from the key value, the current counter
> value, and the key value at the top of the heap.

That's an interesting thought.  So if you changed to a counter one could
use:
 - 

as an indicator of how far from the top of the heap you are.  I presume
heap lru uses seconds since epoch or seconds since process start or some
such as the existing key?  I guess even using the timestamp difference
or something like

- 
-
  - 

might be a rough guide.  Maxheapkey is probably just the current timestamp
on any reasonably busy squid install.  The above is probably a bit
misleading though as the density of objects in time would vary pretty
wildly.

I'd like to find something that would work across all the removal methods
(currently we're using LFUDA).  I'll try and make time to look at the code
and see what the keys are in each one.

On Mon, 20 Apr 2009, RW wrote:

> I should add that since heap key values don't persist over a restart,
> the heap would not initially be in LRU order, so you would have to wait
> for the normal LRU reference age before any statistics are meaningful
> -or start with an empty cache.

I see, thanks.

Gavin




Re: [squid-users] Long running squid proxy slows way down

2009-04-25 Thread Gavin McCullagh
Hi Amos,

On Sat, 25 Apr 2009, Amos Jeffries wrote:

>> ipcache_low 90
>> # ipcache_high 95
>> ipcache_high 95
>> cache_mem 1024 MB
>> # cache_swap_low 90
>> cache_swap_low 90
>> # cache_swap_high 95
>> cache_swap_high 95
>
> For cache >1GB the difference of 5% between high/low can mean long  
> periods spent garbage-collecting the disk storage. This is a major drag.  
> You can shrink the gap if you like less disk delay there.

Could you elaborate on this a little?  If I understand correctly from the
comments in the template squid.conf:

  (swap_usage < cache_swap_low)
-> no cache removal
  (cache_swap_low < swap_usage < cache_swap_high)
-> cache removal attempts to maintain (swap_usage == cache_swap_log)
  (swap_usage ~> cache_swap_high)
-> cache removal becomes aggressive until (swap_usage == cache_swap_log)

It seems like you're saying that aggressive removal is a big drag on the
disk so you should hit it early rather than late so the drag is not for
a long period.  Would it be better to calculate an absolute figure (say
200MB) and work out what percentage of your cache that is?  It seems like
the 95% high watermark is probably quite low for large caches too?

I have 2x400GB caches.  A 5% gap would leave 20GB to delete aggressively
which might take quite some time alright.  A 500MB gap would be 0.125.

cache_swap_low 97.875
cache_swap_high 98

Can we use floating point numbers here?  Would it make more sense for squid
to offer absolute watermarks (in MB offset from the total size)?

Gavin



Re: [squid-users] Long running squid proxy slows way down

2009-04-26 Thread Gavin McCullagh
Hi,

On Sun, 26 Apr 2009, Amos Jeffries wrote:

> almost. The final one is:
>  -> aggressive until swap_usage < cache_swap_low
>  which could be only whats currently indexed (cache_swap_log), or could  
> be less since aggressive might re-test objects for staleness and discard  
> to reach its goal.

I had presumed that squid had a heap or other $STRUCTURE which kept the
cache objects in order of expiry so they could be purged immediately they
expired.  Thinking about it though, perhaps that would kill off all
possibility for TCP_IMS_HITs?

Sorry to be constantly peppering you with these questions, I just find it
all very interesting :-)

>>  Would it be better to calculate an absolute figure (say
>> 200MB) and work out what percentage of your cache that is?  It seems like
>> the 95% high watermark is probably quite low for large caches too?
>
> I agree. Something like that. AFAICT the high being less than 100% is to  
> allow X amount of new data to arive and be stored between collection  
> cycles. 6 GB might be reasonable on a choked-full 100 MB pipe with 5  
> minute cycles. Or it might not.

As I mentioned we have a 20GB gap by default and are on a 40MB pipe which
is often quite choked.  I can't say we've noticed the collection cycles but
maybe we're not measuring it right.

I'll probably change the thresholds to 98%,99%.

>>  Would it make more sense for squid
>> to offer absolute watermarks (in MB offset from the total size)?
>
> Yes this is one of the ancient aspects remaining in Squid and different  
> measures may be much better. I'm having a meeting with Alex Rousskov in  
> approx 5 hours on IRC (#squiddev on irc.freenode.net) to discuss the  
> general store improvements for 3.2. This is very likely to be one of the  
> topics.

Please do let us know how you get on :-)

Thanks as always,
Gavin



Re: [squid-users] Question on changing from ufs to aufs+coss

2009-04-27 Thread Gavin McCullagh
Hi,

On Mon, 27 Apr 2009, Pandu E Poluan wrote:

> One thing is still unclear for me, though: Why is it not a good idea to  
> have 2+ cache on same disk?
>
> In my understanding (CMIIW), aufs is better for larger objects while  
> coss is better for smaller objects (or the other way around).

Is there a doc explaining COSS somewhere?  I'm using two large AUFS caches
on two disks.  Should I consider a COSS cache in addition to or instead of
one of these?  This is squid 2.6-stable18 running on Ubuntu Hardy (yes I am
considering upgrading to v2.7 or v3).

Gavin



Re: [squid-users] Long running squid proxy slows way down

2009-04-27 Thread Gavin McCullagh
Hi,

On Mon, 27 Apr 2009, Wilson Hernandez - MSD, S. A. wrote:

> I have a similar setup, squid was slow and crashing when it had a long  
> time running, crashing every three to six days. I never found out why it  
> crashed. I looked in the log files and couldn't find anything. It just  
> crashed for no reason. There are some post to the least about it. I  
> decided to restart the system everyday from a cron job at 4am. I know  
> that doesn't sound too stable as I'm running it on a linux box but, it  
> worked. It hasn't crash ever since.

Did you get any message in /var/log/squid/* or /var/log/syslog?

I had a similar experience and it turned out to be down to the RAM usage of
squid exceeding 3GB (the limit for a process on a 32bit OS).  As the cache
memory filled up, squid's ram size increased until it restarted, and began
filling up again.  I reduced the mem_cache size and everything is fine
since then.

Gavin



Re: [squid-users] Long running squid proxy slows way down

2009-04-27 Thread Gavin McCullagh
On Mon, 27 Apr 2009, Matus UHLAR - fantomas wrote:

> On 27.04.09 13:35, Gavin McCullagh wrote:
>
> > I had a similar experience and it turned out to be down to the RAM usage of
> > squid exceeding 3GB (the limit for a process on a 32bit OS).  As the cache
> > memory filled up, squid's ram size increased until it restarted, and began
> > filling up again.  I reduced the mem_cache size and everything is fine
> > since then.
> 
> ... which most probably happens due to oversized cache_mem, not noticing 
> it's only about memory cache, not about memory usage:
> 
> http://wiki.squid-cache.org/SquidFaq/SquidMemory

Absolutely.  I meant cache_mem not mem_cache :-)

Gavin



[squid-users] squid + auth + safari + SSL = TCP_DENIED/407

2009-04-30 Thread Gavin McCullagh
Hi,

one of our Mac people has been complaining that he can't get into certain
SSL sites.  I borrowed a MAC and found that these does indeed seem to be a
problem, though apparently not on all SSL sites (a login on www.bebo.com)
is an example that does give the problem.  I'm not sure of this but it
looks like it might be where there's a POST request over SSL.

I noticed this:

http://www2.tr.squid-cache.org/mail-archive/squid-users/200709/0109.html

so I tried turning off authentication and it worked.

I'm using squid-2.6-stable18 which I'm well aware is old.  Is this a bug in
squid or safari or is this known for sure?  Does anyone know if an upgrade
to squid would sort it out?

If not, I may have to put in an ACL either to allow:

 - all macs to be unauthenticated 
 - all SSL to be unauthenticated
 - all requests with safari browser strings using SSL to be unauthenticated

or something like that.  Has anyone had to do this?  Is there a known "best
way"?

Thanks in advance,
Gavin



Re: [squid-users] squid + auth + safari + SSL = TCP_DENIED/407

2009-05-01 Thread Gavin McCullagh
Hi,

On Fri, 01 May 2009, Amos Jeffries wrote:

> This one seems like a browser bug like Henrik says in that post you found.
>
> The only part Squid has in any of this is to open a CONNECT tunnel and  
> shove data bits between browser and server. And auth credentials,  
> challenge or POST content which goes through the tunnel is not touched  
> by Squid in any way.

I presume the auth credentials for squid are outside the data bits that
need shoving, so it seems like it must be an interaction between squid and
safari, but yes, it does sound like a browser bug.

I guess I'll try and craft an ACL which says 
useragent==safari and SSL 
and allow it without authentication.

Thanks,
Gavin



Re: [squid-users] Transparent proxy with HTTPS on freebsd

2009-05-04 Thread Gavin McCullagh
Hi,

On Mon, 04 May 2009, Matus UHLAR - fantomas wrote:

> On 29.04.09 04:58, nyoman karna wrote:

> > you probably may use PAC (as Amos suggested)
> > but IMO it ruin the basic idea of using transparent proxy
> > (which is user does not need to put any setting in their browser)
> 
> the whole idea of intercepting proxy (also called transparent) is sick.

Would you care to substantiate that in a bit more detail?

> WPAD is way to go - browser will autodetect the proxy, so user can log there
> and all problems caused by intercepting connections will be gone.

I've been down this road.  We (a 3rd level college) have hundreds of users
walking on and off a campus with their laptops, mobile phones, netbooks,
pdas, etc.  We used to have posters, docs, everything set up to tell people
how to use the proxy.  We had a proxy.pac.  The support load was massive.
The number of people coming into our office for help setting it up was
huge.  The number of applications that use HTTP but don't support proxy.pac
files is surprisingly large.  The users leave the campus and have to undo
it the proxy settings, then redo them when next on campus.

It was imperative for us to be able to give completely transparent web
access.  It's also a big requirement to have caching to reduce our
bandwidth and give us some kind of logging.  So we have transparent
proxying of http traffic and we simply allow https traffic out.

This policy has been hugely successful.  You might argue that we should
just allow all http and https traffic out but that is more expensive,
slower and harder for us to keep track of (I'm not that keen on logging but
it's necessary for a host of reasons).

As it is now, the web just works for everyone.  People are far happier and
so are we.

Gavin



Re: [squid-users] Transparent proxy with HTTPS on freebsd

2009-05-04 Thread Gavin McCullagh
Hi Jeff,

On Mon, 04 May 2009, Jeff Sadowski wrote:

> On Mon, May 4, 2009 at 3:35 PM, Gavin McCullagh  
> wrote:
> >> the whole idea of intercepting proxy (also called transparent) is sick.
> >
> > Would you care to substantiate that in a bit more detail?
> 
> If your blocking content that would violate rights, maybe; if you are
> doing it to speed things up or blocking sites that have no place in
> the current facility I can not see how it can be claimed as sick.
> I think blocking most porn from schools and work is right. Maybe even
> blocking youtube from work because of how much time is waisted.

I think this misses the issue.  A web proxy is indeed a convenient way to
apply these sorts of blocks.  However, whether you force people to
configure proxies in order to get web access or you do it transparently
doesn't change the blocking.

Currently we have a very short list of blocked sites based mostly on file
sharing.  Personally, I'd like to remove that as I don't consider it
useful.  In certain labs (ie where the students use our computers) at
certain busy times of the year we occasionally block "time-waster" sites in
order to free up those computers for students doing assignments.  Those who
use their own laptops on wifi don't experience that.

Our students are adults.  We don't generally block based on content.  In
Ireland (where we are), primary and secondary schools are all given a
government-sponsored central broadband connection which is content filtered
transparently.  It's not squid, but it's the same principal.  Personally, I
don't really like the idea, but being pragmatic, I understand why they did
it.  Prior to filtering, a large number of teachers were dead set against
giving web access to students (we had bought our own connection).  Now that
they have a comfort blanket of state-sponsored content filtering, they're
fine with students using it.  Sadly, sites like youtube are blocked due to
unsuitable content, which is really a shame as there is lots of very useful
content.

We recently started using HAVP to block viruses/malware, but I think most
people would agree that that's in the student's interest.

Transparent proxying (as opposed to wpad) doesn't make any of this blocking
easier, though I guess perhaps it makes it less apparent.  However, it
makes net access far more convenient (as against wpad) for the user.

Gavin



Re: [squid-users] Transparent proxy with HTTPS on freebsd

2009-05-06 Thread Gavin McCullagh
Hi,

On Wed, 06 May 2009, Matus UHLAR - fantomas wrote:

> On 04.05.09 22:35, Gavin McCullagh wrote:
> > Would you care to substantiate that in a bit more detail?
> 
> Making clients think they connect to the destination server when they do
> not, breaks many things. It disables authentication, causes some TCP
> problems (pmtu discovery?)...

Many thanks for the extra info.

Disabling authentication is unfortunate, but anyone managing a network and
proxy server who decides to use transparent proxying necessarily makes the
decision not to use authentication.

PMTU discovery is not something I had thought about I must say.  At a guess
the main issue is that if a router between client and proxy sends a
"datagram too big" to the proxy, it'll have the IP of the upstream host on
it and will not get to the proxy.   In our case (where the MTU is
consistent across the whole path), that won't be an issue but I can see how
it could be.  I guess you could turn off PMTU disovery on the proxy to
solve this, though that's a bit of a sledgehammer.

There would also be an ambiguous MTU for the client (ie that of the
client<->proxy and the client<->server) which would depend on what port the
client was connecting on (eg it could mix http and https).  I'd guess,
perhaps wrongly (and assuming the icmps are not blocked) the client should
just end up with the minimum MTU for both paths?

> That's bad, luckily many browsers can turn on autodetection and use it when
> available.

You mean the browser downloading http://wpad./wpad.dat? This has
been pretty flakey in our experience.  In most cases you seem to have to
turn it on explicitly which is a huge pain as students don't know how.

> Well, I always call intercepting a thing you should do in "last resort" and
> all troubles caused by the interception should be pointed as client errors.

Fair enough.

> Yes, if you need, keep that there, but I hope you didn't stop providing WPAD
> for anyone who supports it.

We still provide it alright, though I don't think it gets used much.  One
of our networks, where we require authentication still use it all the time.

Gavin



Re: [squid-users] speeding up browsing? any advice?!

2009-05-10 Thread Gavin McCullagh
Hi,

On Sun, 10 May 2009, Roland Roland wrote:

> users on my network have been complaining of slow browsing sessions for a 
> while now..
> i'm trying to figure out ways to speed sessions up without necessarily  
> upgrading my current bandwidth plan...

Squid may help with this.  However, you don't seem to say that you have
determined the cause of the slowness yet.  One potential reason is your
users are saturating the available bandwidth.  Another however, is that you
have loss on a link somewhere.  Another might be your ISP over-contending
you or not giving you the bandwidth you expect.  Another might be slow DNS.  

Squid might indeed help in any or all of these situations.  However, I'd be
inclined to monitor the edge router device with MRTG or similar and track
exactly how much bandwidth is being used.  Also, I'd run smokeping across
the link to some upstream sites and see have you any packet loss.  If you
know the cause, you'll be better able to address the problem.

> though one more question if possible, is there anything i could
> possibly do to speed up browsing aside what i mentioned earlier? 
> 
> keep in mind that i only added an allow ACL to my subnet... and that's
> it! is it enough?

For a start, you may want to look at increasing the cache_dir size.  The
default is 1GB which is pretty small.  The larger your cache, the larger
(albeit decreasingly) your hit rate will be. Once you have a large cache,
you probably want to increase maximum_object_size. If you want to save
bandwidth "Heap LFUDA" may be the best cache removal policy, as opposed to
LRU.  There might also be some sense in looking at delay pools to better
prioritise the bandwidth given to individual users.

Optimising squid's caching can be a big complicated job.

Gavin



Re: [squid-users] transparent proxy with Active Directory Login

2009-05-16 Thread Gavin McCullagh
On Thu, 14 May 2009, Amos Jeffries wrote:

> What can be done is to glean some details such as machine IP and do some
> local not-quite-auth testing on it to see who is logged in and get their
> username back (NP: not password). AD may be able to map IP to current
> user. This has to be done in the background with an external_acl_type
> helper. It's called out-of-band authorization.

Are there any docs or howtos around on this?  We use authentication one one
subnet, but it's a bit of a pain.  We're not really that concerned to
require people to remember passwords, we just want to work out who the user
is with a reasonable level of accuracy.  Authenticated proxies seem to
break various clients so if out-of-band might be an interesting
alternative.

Gavin



Re: [squid-users] Thanks for 3.0-STABLE14/15

2009-05-21 Thread Gavin McCullagh
On Thu, 21 May 2009, Travel Factory S.r.l. wrote:

> it would be nice to know your configuration (cpu/ram/disk/heap/etc etc etc)

In particular, if you could give data for this page...

http://wiki.squid-cache.org/KnowledgeBase/Benchmarks

Gavin



[squid-users] tproxy vs DNAT

2009-05-29 Thread Gavin McCullagh
Hi,

there's been a lot of talk about TPROXY being added back into the linux
kernel and squid changing to support it.

Currently, we do transparent proxying by policy routing port 80 traffic to
the proxy server then using DNAT (iptables) on the proxy server.  

Could someone point me to something that explains the benefit of TPROXY
over DNAT?  We would look to migrate over if there's a substantial benefit.

Thanks in advance,
Gavin



Re: [squid-users] Squid is running but nothing happens

2009-06-11 Thread Gavin McCullagh
Hi,

On Wed, 10 Jun 2009, oar22 wrote:

> Sorry if this has been asked a bunch of times before, but seem to have hit a
> dead end here. I'm running Squid version 3.0 on linux. When it starts up it
> shows that it's accepting HTTP connections at the correct IP. I'm able to
> test using squidclient and that part seems to work just fine -- the correct
> HTML comes up, all the relevant log files get written to, and I don't get
> any error messages. However, when I put in the same IP in the browser's
> proxy settings along with the default IP, any page I try to go to just times
> out. I turned off the Windows firewall and the antivirus on the client
> machine but to no avail. Any suggestions would be very much appreciated.

From the client machine with the browser, can you:

1. Ping the proxy server (ie make sure there's no routing issues).
2. Telnet to the proxy server on the relevant port, get a reply and 
GET http://google.com/ HTTP/1.0 
   (make sure a firewall isn't blocking connections).

and if you like:

3. Run "tcpdump -i  tcp host " on the proxy server
   to see what network conversation the proxy server has with the client.
   As with [2], no packets probably indicates a firewall blocking
   the path.

Gavin




Re: [squid-users] Are you on mobile/handset?

2009-06-16 Thread Gavin McCullagh

Hi

On 16 Jun 2009, at 08:05, Luis Daniel Lucio Quiroz > wrote:



Hi Squids,

How do you think should be the best way to detect if a user is  
surfing inet

throut its mobile/handset?


The user agent string sounds like the obvious answer?

Gavin


[squid-users] squid date based acls

2009-06-17 Thread Gavin McCullagh
Hi,

I was just looking to set up ACLs based on dates, eg something like:

acl termtime   date 01/06-31/08
acl summertime date 01/09-31/05

but I can't seem to find that feature in the manual.  Is this not possible
or have I missed something?  If it's not currently in squid, would it be a
useful feature for people other than me?

Gavin



Re: [squid-users] https from different Subnet not working

2009-07-14 Thread Gavin McCullagh
Hi Ralph,

I'll add a couple of thoughts, but not really an answer.

On Tue, 14 Jul 2009, Jarosch, Ralph wrote:

> If I connect from an branch office with the subnet 10.37.34.*/24 to an https 
> website i´ve no Problems.
> If I do the same from another location with an subnet like 10.39.85.*/24 I 
> get the following error message.

Presumably you're using the same URL to test in both places and the same
proxy settings?

I'll note in passing that you're running a very ancient version of squid
(2.5.STABLE12).  I doubt an upgrade would fix your problem, but at some
point, you should consider an upgrade nonetheless.

> The requested URL could not be retrieved
> 
> While trying to retrieve the URL: http.yyy.xxx:443 
> The following error was encountered: 
> Unable to determine IP address from host name for 
> The dnsserver returned: 
> Name Error: The domain name does not exist. 
> This means that: 
>  The cache was not able to resolve the hostname presented in the URL. 
>  Check if the address is correct. 
> Your cache administrator is webmaster. 
> 
> Generated Tue, 14 Jul 2009 08:10:39 GMT by xxx (squid/2.5.STABLE12)
> 
> The requester url was https://www.ebay.com

It's a little odd that you removed the URL from the output, only to tell us
it afterward, but how and ever.  Also, you've removed the name of the web
proxy that generated the error, which is a little unhelpful as you appear
to have 5 proxy servers.

What the above error tells you is that the squid web proxy couldn't get a
DNS response for the site you wanted to go to, ie

"  The cache was not able to resolve the hostname presented in the URL."

It seems surprising that that problem would happen in a repeatable way that
affected one client but not another.

I note that you have several parent cache peers:

> cache_peer 10.37.132.5 parent 3128 7 no-query proxy-only no-digest sourcehash
> cache_peer 10.37.132.6 parent 3128 7 no-query proxy-only no-digest sourcehash
> cache_peer 10.37.132.7 parent 3128 7 no-query proxy-only no-digest sourcehash
> cache_peer 10.37.132.8 parent 3128 7 no-query proxy-only no-digest sourcehash

I wonder could it be that only one of the cache peers is having DNS issues?
Could you point a browser directly at each individual parent cache and see
can you get the webpage you're looking for.

Gavin



[squid-users] Collapsed Forwarding was Re: [squid-users] Kerberos authentication & pre-caching in Squid for Windows

2009-08-08 Thread Gavin McCullagh
On Sat, 08 Aug 2009, Amos Jeffries wrote:

> In a school situation you will also find the collapsed_forwarding  
> features of Squid very useful. It can reduce/collapse a full classroom  
> worth of duplicate requests for the same lesson website, down to a set  
> of single requests to fetch the page once.

Interested in this benefit I enabled it on our squid 2.6.18 server.

I haven't tried to measure the effects yet, but I do note some of these
entries in the logs:

  Aug  8 19:31:33 watcher squid[6908]: clientProcessExpired: collapsed request 
STALE!

Is this normal or does it indicate anything bad?

Thanks in advance,
Gavin



Re: [squid-users] Howto run Internet Explorer without proxy setting in Internet Options

2009-08-09 Thread Gavin McCullagh
Hi,

On Sun, 09 Aug 2009, Andrej van der Zee wrote:

> I mean, is there really nothing on Windows that you can instruct to
> forward all outgoing TCP/UDP traffic to port 80/443 to a port on a
> remote machine? I just need the remote proxy server to be accessed
> transparently, without using the browser setting. Is that really
> impossible on Windows? That is hard to believe, there must be a way.

It sounds like what you're looking for is that the proxy be transparent to
the web server not to the web browser.

If you're in a position to set up squid on the remote server, a VPN might
be the best approach to this.  You can do that in a very simple way with
ssh:

http://fermiparadox.wordpress.com/2008/06/12/vpn-with-openssh/

Though you probably can configure squid to not forward any headers which
reveals itself.

Gavin



Re: [squid-users] caching videos (flv)

2009-08-17 Thread Gavin McCullagh
On Mon, 17 Aug 2009, Mahesh Ollalwar wrote:

> I have implemented squid to cache videos (.flv) and allocated 45 GB of  
> space for cache_dir, my problem here is that, I can store data only upto  
> 10-15 days depend upon the video size. Is there any way to store cache  
> in compressed format.

It might be possible but you likely wouldn't gain much if anything from it.
FLVs are heavily compressed already so you're unlikely to get much further
compression.  Try grabbing a copy of one of those FLVs and running gzip on
it to see.

Gavin





Re: [squid-users] SQUID PAC-File and JAVA (1.6.11)

2009-08-17 Thread Gavin McCullagh
Hi,

On Mon, 17 Aug 2009, Volker Jahns wrote:

> We have a lot of IE clients here with a url..proxy.pac file as proxy
> configuration and without automatically finding a proxy server. Whenever we
> use SSL explorer and a JAVA program the final sync failed. If I change the
> configuration to the same manual proxy server and its port it works.

In my experience, what the Java VM can read in proxy.pac/wpad.dat files is
somewhat more limited than IE.  I'd suggest you keep a _very_ simple wpad
if at all possible.  For example, don't try and code the wpad.dat to use
its own IP address.  That really doesn't work in lots of situations.

A tcpdump/windump on the computer watching port 80 should give you an idea
whether Java is really following the proxy settings you think it should.

If you want you can post your script here.

Gavin




Re: [squid-users] RE: SQUID PAC-File and JAVA (1.6.11) SOLVED?

2009-08-18 Thread Gavin McCullagh
Hi,

On Tue, 18 Aug 2009, Henrik Nordstrom wrote:

> tis 2009-08-18 klockan 03:23 -0500 skrev Bill Allison:
> 
> > For example, on a Windoze client (XP-SP3 at least) on VPN, the
> > javascript function myIPAddress() will return the IP address of the
> > *outside* of the tunnel
> 
> Yes, and a number of other similar situations as well.

I've heard talk of IPv6 addresses causing issues too.

> My general recommendation is to code the needd myIPAddress logics on the
> server side instead if possible. I.e. have the PAC served by a webserver
> script, using REMOTE_ADDRESS as input determining where the client is
> located.  But that obviously won't work very well on roadwarrior
> clients..

We tend to have the domain assigned to each computer by the DHCP server
differ depending on the machine's address range.  This means
wpad. is different for each ip range so you can have static wpad
files.

However, I was hoping to have them all hosted on one apache server using a
vhost but it turns out that some but not all of the browsers (firefox and
some versions of IE) don't load http://wpad./wpad.dat.  Instead
they do a DNS lookup, then download http:///wpad.dat which
means you can't use vhosts.  This is a real PITA.  For now, we've got the
wpads on different servers, but a PHP-driven wpad.dat was my next solution.

Gavin



Re: [squid-users] Java not working behind squid

2009-08-25 Thread Gavin McCullagh
Hi,

On Tue, 25 Aug 2009, Truth Seeker wrote:

> I have squid-3.0.STABLE13-1.el5 on CentOS 5.3 which is authenticating with 
> 2003 AD (kerb + winbind) and have different acls (group based) in place.
> 
> The problem is, java is not working for our users. Previously they all were 
> using ISA, and java was working for them.
> 
> in the following site;
> 
> http://www.dailyfx.com/  3rd coloumn in the right side shows the "Live 
> currency rates" which is working with java. This is a must in our 
> environment... 
> 
> Awaiting your response...

We have a similar setup on one VLAN, with squid on linux authenticating
users using active directory.  We've seen lots of issues with Java not
being able to authenticate.

Testing the page you're talking about (albeit with a linux desktop), I get
a java popup window asking me for my AD username/password/domain, I type it
in but repeatedly it fails.

The squid access.log says:

1251204847.837  0 172.16.1.3 TCP_DENIED/407 1846 CONNECT 
balancer.netdania.com:443 - NONE/- text/html
1251204847.842  0 172.16.1.3 TCP_DENIED/407 1846 CONNECT 
balancer.netdania.com:443 - NONE/- text/html

I'm not sure if these lines in cache.log are relevant or not.

[2009/08/25 13:42:00, 1] libsmb/ntlmssp.c:ntlmssp_update(267)
  got NTLMSSP command 3, expected 1
[2009/08/25 13:42:00, 1] libsmb/ntlmssp.c:ntlmssp_update(267)
  got NTLMSSP command 3, expected 1
[2009/08/25 13:42:01, 1] libsmb/ntlmssp.c:ntlmssp_update(267)
  got NTLMSSP command 3, expected 1
[2009/08/25 13:42:01, 1] libsmb/ntlmssp.c:ntlmssp_update(267)
  got NTLMSSP command 3, expected 1
[2009/08/25 13:47:02, 1] libsmb/ntlmssp.c:ntlmssp_update(267)
  got NTLMSSP command 3, expected 1

My usual workaround is to add an ACL for that site which is far from ideal.
I've added the following ACL:

acl dailyfx dstdomain balancer.netdania.com
http_access allow dailyfx CONNECT

That works around the issue for me.  I still get prompted for the username
and password and the logs suggest some traffic isn't getting through.

1251205769.600  14385 172.16.1.3 TCP_MISS/000 7263 CONNECT 
balancer.netdania.com:443 - FIRST_UP_PARENT/172.20.2.3 - 1251205771.233  1 
172.16.1.3 TCP_DENIED/407 1954 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? - NONE/- text/html
1251205771.239  3 172.16.1.3 TCP_DENIED/407 1969 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? - NONE/- text/html
1251205771.516277 172.16.1.3 TCP_MISS/200 1443 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? gavinmc 
FIRST_UP_PARENT/172.20.2.3 application/zip
1251205774.813 55 172.16.1.3 TCP_DENIED/407 1954 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? - NONE/- text/html
1251205774.816  0 172.16.1.3 TCP_DENIED/407 1969 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? - NONE/- text/html
1251205776.537   1721 172.16.1.3 TCP_MISS/200 1125 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? gavinmc 
FIRST_UP_PARENT/172.20.2.3 application/zip
1251205779.681  1 172.16.1.3 TCP_DENIED/407 1954 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? - NONE/- text/html
1251205779.685  1 172.16.1.3 TCP_DENIED/407 1969 GET 
http://balancer.netdania.com/StreamingServer/StreamingServer? - NONE/- text/html

If I drop the word CONNECT I get no errors at all, but that disables
authentication entirely for that site.

There is definitely some issue with austhentication and Java.  I'm not sure
if it might actually be Authentication+Java+SSL.  Our problems are
generally with java-driven online banking applications.

Gavin 




Re: [squid-users] Hardware configuration for Squid that can handle 100 - 200 Mbps

2009-08-27 Thread Gavin McCullagh
Hi,

On Thu, 27 Aug 2009, Chris Robertson wrote:

> It will if you can fill them without overloading the cache index.  Each  
> object in the cache needs to be indexed in memory.  The 10MB of RAM per  
> GB of disk space assumes an average object size of 10KB.  Using that  
> rule of thumb, you'd need 100GB of RAM for a 10TB cache (just for the  
> index)!

This rule of thumb may be reasonable, but the average object size seems too
low to me.  We see an 80KB average object size, which suggests something
more like 1.25MB per GB.  We have 1GB as our disk cache max object size
which doubtless does expand the average.  If you want high byte hit rate
you need to cache large objects.

We have an 800GB cache (2x400GB on two 1TB disks) and although there's 8GB
of ram, it's on a 32-bit operating system (so squid's process size can't go
above 3GB).  The above value suggests the indexes take up about 1GB of RAM.
We have a 1GB memory cache and all runs fine without swapping so the index
must be fitting into 2GB.  

Gavin



Re: [squid-users] Hardware configuration for Squid that can handle 100 - 200 Mbps

2009-08-28 Thread Gavin McCullagh
Hi,

On Fri, 28 Aug 2009, Amos Jeffries wrote:

> Whereas I have two caches. One has an average object size of 64KB the  
> other 128KB. However people from large ISPs are still posting ~10KB avg  
> object sizes in their stats.

Fair enough.  I suppose our students are quite into their assorted video
sites which must drag things up.  That

>> We have an 800GB cache (2x400GB on two 1TB disks) and although there's 8GB
>> of ram, it's on a 32-bit operating system (so squid's process size can't go
>> above 3GB).  The above value suggests the indexes take up about 1GB of RAM.
>> We have a 1GB memory cache and all runs fine without swapping so the index
>> must be fitting into 2GB.  
>
> Its a 4095 MB cap on 32-bit. (4 GB minus 1 KB).

It seems I'm wrong again.  Bah!  Thanks for the correction :-)

http://kerneltrap.org/node/2450

Gavin




Re: [squid-users] Java not working behind squid

2009-09-01 Thread Gavin McCullagh
On Tue, 01 Sep 2009, Truth Seeker wrote:

> Really thanks for your effort... i was not able to get back to you, just
> bcoz there were so many unexpected issues on the proxy...
> 
> Now your resolution didnt worked for me... 
> 
> I didnt even got the 
> http://balancer.netdania.com/StreamingServer/StreamingServer? in my access.log
> 
> rather i could see always DENIED for balancer like the following 
> 
> TCP_DENIED/407 2912 CONNECT balancer.netdania.com:443 - NONE/- text/html

Perhaps you might tell us (ie copy and paste config) exactly what you did.

Gavin



Re: [squid-users] Java not working behind squid

2009-09-01 Thread Gavin McCullagh
Hi,

On Tue, 01 Sep 2009, Tejpal Amin wrote:

> Try putting this acl
> 
> acl Java browser Java/1.4 Java/1.5 Java/1.6
> http_access allow Java
> 
> This worked for me when using NTLauth.

Thanks, though I'm not the one in need of a solution and I'm not that keen
to give Java full unauthenticated browsing rights.  

Perhaps Truth Seeker(?) might try that though.

Am I to understand that Java is just really bad at NTLM auth, so much so
that people just whitelist it for unauthenticated access?

Gavin



Re: [squid-users] Java not working behind squid

2009-09-03 Thread Gavin McCullagh
On Thu, 03 Sep 2009, Truth Seeker wrote:

> >   acl Java browser Java/1.4 Java/1.5 Java/1.6
> >   acl localnet src 192.168.0.1/24
> >   http_access allow localnet Java
> 
> But for me even with the above said acl's its not working. In the client side 
> i tested with 
> a. ubuntu 9.04 box and with firefox 3.0, (here a java window is prompting
> for user/pass and once i given the req info, then it says "Error Details"
> in that Java window in dailyfx.com)
> 
> b. with win XP and firefox and IE (both just given Error details)
> 
> Now what can i do? to get this site working with our env

Does the above http_access come before or after the http_access which
allows people to access when authenticated?

Gavin



[squid-users] squid counters appear to be wrapping on squid v2.6.18 (old I know)

2009-10-05 Thread Gavin McCullagh
Hi,

we're seeing something odd on squid v2.6.18-1ubuntu3.  I know this is an
old version and not recommended but I just thought I'd point it out to make
sure this has been fixed in a more recent version.

After some time running, a couple of squid's pointers appear to be
wrapping, like signed 32-bit integers.  In particular:

  client_http.kbytes_out = -2112947050

We noticed this as we use munin, which queries the counters in this way and
ignores negative values.  The select_loops value is also negative.  If this
is fixed in v2.7 that's fair enough but I thought I'd mention it here in
case it isn't.

Gavin


gavi...@watcher:~$ nc localhost 8080
GET cache_object://localhost/counters HTTP/1.0
Accept: */*

HTTP/1.0 200 OK
Server: squid/2.6.STABLE18
Date: Mon, 05 Oct 2009 15:44:17 GMT
Content-Type: text/plain
Expires: Mon, 05 Oct 2009 15:44:17 GMT
Last-Modified: Mon, 05 Oct 2009 15:44:17 GMT
X-Cache: MISS from watcher.gcd.ie
X-Cache-Lookup: MISS from watcher.gcd.ie:8080
Via: 1.0 watcher.gcd.ie:8080 (squid/2.6.STABLE18)
Proxy-Connection: close

sample_time = 1254757456.12734 (Mon, 05 Oct 2009 15:44:16 GMT)
client_http.requests = 63518961
client_http.hits = 27155728
client_http.errors = 5191
client_http.kbytes_in = 77340031
client_http.kbytes_out = -2112947050
client_http.hit_kbytes_out = 261721929
server.all.requests = 36663719
server.all.errors = 0
server.all.kbytes_in = 1908075341
server.all.kbytes_out = 61829714
server.http.requests = 36233326
server.http.errors = 0
server.http.kbytes_in = 1901156005
server.http.kbytes_out = 59068791
server.ftp.requests = 28
server.ftp.errors = 0
server.ftp.kbytes_in = 941732
server.ftp.kbytes_out = 4
server.other.requests = 430365
server.other.errors = 0
server.other.kbytes_in = 5977603
server.other.kbytes_out = 2760918
icp.pkts_sent = 0
icp.pkts_recv = 0
icp.queries_sent = 0
icp.replies_sent = 0
icp.queries_recv = 0
icp.replies_recv = 0
icp.query_timeouts = 0
icp.replies_queued = 0
icp.kbytes_sent = 0
icp.kbytes_recv = 0
icp.q_kbytes_sent = 0
icp.r_kbytes_sent = 0
icp.q_kbytes_recv = 0
icp.r_kbytes_recv = 0
icp.times_used = 0
cd.times_used = 0
cd.msgs_sent = 36  
cd.msgs_recv = 36
cd.memory = 0
cd.local_memory = 6481
cd.kbytes_sent = 3
cd.kbytes_recv = 48
unlink.requests = 0
page_faults = 6
select_loops = -1656175576
cpu_time = 78423.85
wall_time = 1.478176
swap.outs = 12350417
swap.ins = 39255680
swap.files_cleaned = 1158
aborted_requests = 1300408



[squid-users] (mis)understanding delay pools?

2009-10-10 Thread Gavin McCullagh
Hi,

we're running the packaged Squid on Ubuntu LTS (2.6.18-1ubuntu3).

I've configured squid with a delay pool as follows:

  acl accommclients_old   src 10.2.0.0/16
  acl accommclients   src 172.17.0.0/20
  acl studentclients  src 172.18.0.0/16
  acl studentwificlients  src 172.19.0.0/23
  acl summerschoolclients src 172.19.4.0/24

  delay_pools 1
  delay_class 1 1
  delay_access 1 allow accommclients accommclients_old studentclients 
studentwificlients
  delay_access 1 deny all
  delay_parameters 1 250/250

which I thought would limit all of those IP ranges (together) to
2.5MByte/sec, ie 20Mbit/sec.

However, the other day, we found squid consuming over 40Mbit/sec.  This was
mostly down to one individual user who was within the 172.17/20 range who
was consuming well over 30Mb/sec.

I presume I'm doing something wrong.  Could someone point out my mistake or
suggest something I should look at?

Gavin



Re: [squid-users] (mis)understanding delay pools?

2009-10-11 Thread Gavin McCullagh
On Sun, 11 Oct 2009, Amos Jeffries wrote:

> See
> http://wiki.squid-cache.org/SquidFaq/SquidAcl#Common_Mistakes
>
> The YOU/ME example mistake is exactly the one you have made above.

How infuriating.  Thanks very much for pointing it out to me :-)

Gavin




Re: [squid-users] (mis)understanding delay pools?

2009-10-11 Thread Gavin McCullagh
Hi,

just a further question on this.  

On Sun, 11 Oct 2009, Amos Jeffries wrote:

>>   acl accommclients_old   src 10.2.0.0/16
>>   acl accommclients   src 172.17.0.0/20
>>   acl studentclients  src 172.18.0.0/16
>>   acl studentwificlients  src 172.19.0.0/23
>>   acl summerschoolclients src 172.19.4.0/24

>>   delay_access 1 allow accommclients accommclients_old studentclients 
>> studentwificlients
>
> See
> http://wiki.squid-cache.org/SquidFaq/SquidAcl#Common_Mistakes
>
> The YOU/ME example mistake is exactly the one you have made above.

I feel pretty stupid falling on such a bog standard mistake and I'm annoyed
at myself that it has been in place for some months now.  

It strikes me that, in this case, the mistake lead to an internally
contradictory (multiple times over!!) config.  It couldn't possibly have
been correct.  Would it be practical for squid to give a warning in this
instance?

I'm not saying squid should necessarily molly-coddle its users, but if it
weren't difficult to do perhaps it would lead to a greater degree of people
spotting their own mistakes early (before they use it for months thinking
it's working or give up confused or ask the mailing list).  Compilers, for
example, do a certain amount of this kind of thing which often prevents
bugs in code.

Just looking at the FAQ page it might be nice to warn on:

 - An _access combination of ACLs which cannot match anything (eg colour is
   black and colour is white)
 - An _access which comes after one which is more general than it (eg allow
   all red colours; deny pink)
 - Possibly suggest use of src instead of srcdomain (though this is probably
   not wrong in some instances)

though there are probably others.

Perhaps this has been suggested before or perhaps there are good reasons
not to do it?  Perhaps it's already there and I haven't spotted it?

Gavin



Re: [squid-users] Collapsed Forwarding was Re: [squid-users] Kerberos authentication & pre-caching in Squid for Windows

2009-10-25 Thread Gavin McCullagh
Hi,

does anyone have a definitive answer on this?  I get the 
"collapsed request STALE!" 

message fairly regularly but haven't noted any problems.  That said, I'm
not sure I would get a complaint if there were a problem.  If it's a
harmless issue, I'll add it to our logcheck database -- and possibly submit
a patch to logcheck to add it to their squid ignore patterns.

Thanks,
Gavin

On Sat, 08 Aug 2009, Gavin McCullagh wrote:

> On Sat, 08 Aug 2009, Amos Jeffries wrote:
> 
> > In a school situation you will also find the collapsed_forwarding  
> > features of Squid very useful. It can reduce/collapse a full classroom  
> > worth of duplicate requests for the same lesson website, down to a set  
> > of single requests to fetch the page once.
> 
> Interested in this benefit I enabled it on our squid 2.6.18 server.
> 
> I haven't tried to measure the effects yet, but I do note some of these
> entries in the logs:
> 
>   Aug  8 19:31:33 watcher squid[6908]: clientProcessExpired: collapsed 
> request STALE!
> 
> Is this normal or does it indicate anything bad?
> 
> Thanks in advance,
> Gavin




Re: [squid-users] Kerberos authentication & pre-caching in Squid for Windows

2009-10-26 Thread Gavin McCullagh
On Sat, 08 Aug 2009, Amos Jeffries wrote:

> In a school situation you will also find the collapsed_forwarding  
> features of Squid very useful. It can reduce/collapse a full classroom  
> worth of duplicate requests for the same lesson website, down to a set  
> of single requests to fetch the page once.

I don't mean to be smart here but the feature page says:

  http://wiki.squid-cache.org/Features/CollapsedForwarding

 "To remedy this situation this patch adds a new tuning knob to squid.conf,
  making Squid delay further requests while a cache revalidation or cache
  miss is being resolved. This sacrifices general proxy latency in favor for
  accelerator performance and thus should not be enabled unless you are
  running an accelerator."

So is collapsed forwarding generally a bad idea for a forward proxy?

Gavin



Re: [squid-users] Time-based oddity that I can't quite nail down...

2009-11-09 Thread Gavin McCullagh
Hi,

On Sun, 08 Nov 2009, Kurt Buff wrote:

> During the normal work out at my company, the squid proxy is
> reasonably responsive, and seems to work well.  However, after roughly
> 5pm each day, through the night and all during the weekend, web browsing
> is very slow, with pages taking a very long time (30+ seconds, to
> sometimes minutes) to load.
> 
> Does anyone have some suggestions on where I might start looking at
> this problem? I haven't found anything in the logs that I can detect
> as relevant. Stopping and starting squid makes no difference.

The first thing I'd be inclined to try is to connect directly, bypassing
squid and see does the problem go away.  If it's quick to connect directly,
then squid is probably where the issue is.  If you still see the delays
going direct then it's probably something else (eg high contention on your
link).

Gavin



[squid-users] problems with squid_ldap_auth

2009-11-10 Thread Gavin McCullagh
Hi,

I've been trying to get squid_ldap_auth to work on a debian lenny box here
using the packaged squid, which is as follows:


gavi...@muinnamuice:~$ sudo squid -v
Squid Cache: Version 2.7.STABLE3
configure options:  '--prefix=/usr' '--exec_prefix=/usr' '--bindir=/usr/sbin' 
'--sbindir=/usr/sbin' '--libexecdir=/usr/lib/squid' '--sysconfdir=/etc/squid' 
'--localstatedir=/var/spool/squid' '--datadir=/usr/share/squid' 
'--enable-async-io' '--with-pthreads' 
'--enable-storeio=ufs,aufs,coss,diskd,null' '--enable-linux-netfilter' 
'--enable-arp-acl' '--enable-epoll' '--enable-removal-policies=lru,heap' 
'--enable-snmp' '--enable-delay-pools' '--enable-htcp' '--enable-cache-digests' 
'--enable-underscores' '--enable-referer-log' '--enable-useragent-log' 
'--enable-auth=basic,digest,ntlm,negotiate' 
'--enable-negotiate-auth-helpers=squid_kerb_auth' '--enable-carp' 
'--enable-follow-x-forwarded-for' '--with-large-files' '--with-maxfd=65536' 
'i386-debian-linux' 'build_alias=i386-debian-linux' 
'host_alias=i386-debian-linux' 'target_alias=i386-debian-linux' 'CFLAGS=-Wall 
-g -O2' 'LDFLAGS=' 'CPPFLAGS=' 


I've been following a set of tutorials such as 

http://www.grolmsnet.de/kerbtut/

http://klaubert.wordpress.com/2008/01/09/squid-kerberos-authentication-and-ldap-authorization-in-active-directory/
http://wiki.squid-cache.org/ConfigExamples/Authenticate/Kerberos

I've set up the key tab, created on the windows server and trasnferred over and
all seems to be working:


gavi...@muinnamuice:~$ sudo -u proxy kinit -V -k -t 
/etc/squid/squid_muinnamuice.krb5keytab 
SQUID/muinnamuice.staff.gcd...@staff.gcd.ie
Authenticated to Kerberos v5
gavi...@muinnamuice:~$ klist -e
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: SQUID/muinnamuice.staff.gcd...@staff.gcd.ie

Valid starting ExpiresService principal
11/10/09 18:00:57  11/11/09 00:40:57  krbtgt/staff.gcd...@staff.gcd.ie
Etype (skey, tkt): ArcFour with HMAC/md5, ArcFour with HMAC/md5 


Kerberos 4 ticket cache: /tmp/tkt1000
klist: You have no tickets cached
gavi...@muinnamuice:~$ kvno SQUID/muinnamuice.staff.gcd...@staff.gcd.ie
SQUID/muinnamuice.staff.gcd...@staff.gcd.ie: kvno = 3


However, using IE8 which requires ldap auth, authentication seems to be
failing.  Below is the outout with debug level 3 in squid.


2009/11/10 18:23:31| Parser: retval 1: from 0->40: method 0->2; url 4->29; 
version 31->39 (1/1)
2009/11/10 18:23:31| Parser: retval 1: from 0->40: method 0->2; url 4->29; 
version 31->39 (1/1)
2009/11/10 18:23:31| squid_kerb_auth: Got 'YR 
YIIF9wYGKwYBBQUCoIIF6zCCBeegMDAuBgkqhkiC9xIBAgIGCSqGSIb3EgECAgYKKwYBBAGCNwICHgYKKwYBBAGCNwICCqKCBbEEggWtYIIFqQYJKoZIhvcSAQICAQBuggWYMIIFlKADAgEFoQMCAQ6iBwMFACCjggSGYYIEgjCCBH6gAwIBBaEOGwxTVEFGRi5HQ0QuSUWiKzApoAMCAQKhIjAgGwRIVFRQGxhtdWlubmFtdWljZS5zdGFmZi5nY2QuaWWjggQ4MIIENKADAgEXoQMCAQKiggQmBIIEIlFvkWb8/ir66BtS/JRa4PkFzvR933EJLhmawTrp1zUPylFzUyBx7RitmQvcNTZ4ZI8Pre1MKsGeRzKUcbZBGD6q1dMga1npFmLz7oIVwIjXiFo+uVD9t8ZI+OhnaIC4mnWR3Zsavas5e5bbwRYclTkx7j3OCbJcZzlGwjOjTu0n7EAkbQhBt7QeHMDsAOk/M1UfwY+Gtrx9W89+2sxqScsProjv2lPKCr/u4QyK9T1jG2QrP8ImjSDZG+3MAmjjpenxwC/VFRPNVAC1SR4U1gqeYKONT3IYLdYnZkfusHxBpVSJ7oHhfUMNXlKe9nah2lCDDoMgvnw2pxvskxguQ45yQ19YWMRY3LG1MOIXQLWO+b+tcJuB1DE/7XIQiwBTXwjTWfJg8dkB9pmQKuDIG6giLWaHboUQ/hH4jrVZetXAq9fbP2slyzikerBLSVu0N8sKgdNJXZWgkelwsXBqXkHbiwvZXupkrMuqHybNrMUCfszU5Ifuew5fzrO7v93saFl6Qv19zqzCH54TCczURkZtqFpSIcqqRVHwA/pT+xfr2lx6Tpg4AjJ4rqzuXrut/qJzCtrBS7StpkIzn1FEsrhYWvLHXKv69AEmAE9d+B33J/pWzUMPZ7XhycSq7Ay+pUKyAz6t2mf0y5bOFSBn7N2SNLKFIV4TaClmMwMX7VIk3+Kaf7f6v6j77H9E7XBcLZrfqXRnRRXljRArC661ETxTaeMm90f5fVzIxD1AqQLlbasu6AZ+7zBSiJZflzkqHWINPWxeU/VqvNggjQor6uKlJz0l2gipyBSjuLoi/HVA8da3Eu6XPyx4oP4APAHE/Cyvx83E4mBlZeEy9dJMk5dWmX3Bnr1qEeN+o5BjuzbQRlI2uWYZEy3TjFl4TpduJ4XO6DcUXGtN6Fg2UxWZrs7tvc8vhBy4Twq/tYO69yCnYkJLI1DzqEk2joyZh33j3KwYqA5VHbqve3gsj+9Ft+XIlpxdkJ2JEYB7Dq50qekyv1ozvo4wr9aGI3E7HTij4wUJP10HRxg3tFQs8rWdZT2r08Zon0xlLPXta5rTGzj99Dn63TB9E1YA6Q0obfZIE+uylhXj3cK/T4q09RfJiojE/T9BQiTG6HVvrvQm+RHwlPsa+6yZqp6oBODfDNHOH2iGBYGl//SNjpZAt6B2RsMItdHS/Q1v/JRJD3+xZUwWm8kspJMUWr+sPm0BpEQtykjQhcJBdLMpFFCJlL7tGYhHzEb92hnuMIZwRoL0JROH+kGxahmkHI+Oj6sCeewfxWxGgKAO5aV8gSjJTJaVlWJGk6s6KQ/onjmAKsfzcXY83ZZSuQ9LjJbIP4Gf99jJOeZEW44x8W+X6yevJPDiFeXxkBanWUAwJF5q+4vPeD9LbOHnZ0L7Px3WBO9fjTyC1yYxpIH0MIHxoAMCAReigekEgeZ2GzgZue9olCJji0cIUE6we4nigI5cLAr8Xm4GFisky95FyvuSmKdYzFk53Q3VsSuLOcG0e5y3Z4QeT0dFr0UtF9qmX5jGYOWfrIzEduuiTXpdwi3J4gbQokTcVyhv8T8t99SSooI0nbKsR9sEVHYfvCdpbb64dN0asBX3IBmsdRVOCdyuzlBnT4X7Jz6YS09Fo+IL2ixM6XD9VrNHWi5BaSsrZBo0g4ghGw8g/7M7gvX3Osgo0fyzVGINlBZISlQhIOKdS9MrI8EtWtgrIDMaqZ

Re: [squid-users] Your opinion: Squid and MySQL interaction, for redirecting users? Is it possible?

2009-11-22 Thread Gavin McCullagh
Hi,

On Sun, 22 Nov 2009, Jonas Brunsgaard wrote:

> I have the squid, i have the MySQL data. But how do I make the Squid
> distinguish between blocked and not blocked. And further how do I
> redirect blocked users to the internal "you are a bad boy" page.

We do something like this by keeping the blocked IPs in a file.

  acl blockedclients  src "/etc/squid/blocked_ips"
  http_access deny  blockedclients
  deny_info http://www.our.tld/temp/message.html blockedclients

A simple method might be to just regenerate the file once an hour on a cron
job?

Gavin



[squid-users] help with squid error

2010-04-12 Thread Gavin McCullagh
Hi,

Could someone perhaps give me a clue as to the reason for the following
"417 Expectation Failed" error I'm getting back from squid.  This is an
online video system with a flash player and it would appear to be flash
making a direct HTTP connection through the proxy.

Request packet fragment and wiresharked response are below.

Many thanks in advance,
Gavin

The request is (sorry it's not in a better format):


  00 0d 56 5e b5 00 00 1a a0 8c 7d 15 08 00 45 00   ..V^..}...E.
0010  01 4b 0d 16 40 00 40 06 c2 1e ac 10 01 03 ac 10   @.@.
0020  11 55 8b 45 1f 90 4b c1 5e 20 1d d6 8b 1e 80 18   .U.E..K.^ ..
0030  00 5c 6b b6 00 00 01 01 08 0a 00 16 53 0b 26 cb   .\k.S.&.
0040  55 7a 50 4f 53 54 20 68 74 74 70 3a 2f 2f 38 39   UzPOST http://89
0050  2e 32 30 37 2e 35 36 2e 31 30 37 2f 73 65 6e 64   .207.56.107/send
0060  2f 65 70 31 6d 62 52 51 75 36 48 58 57 56 59 7a   /ep1mbRQu6HXWVYz
0070  49 2f 31 20 48 54 54 50 2f 31 2e 31 0d 0a 48 6f   I/1 HTTP/1.1..Ho
0080  73 74 3a 20 38 39 2e 32 30 37 2e 35 36 2e 31 30   st: 89.207.56.10
0090  37 0d 0a 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a   7..Accept: */*..
00a0  50 72 6f 78 79 2d 43 6f 6e 6e 65 63 74 69 6f 6e   Proxy-Connection
00b0  3a 20 4b 65 65 70 2d 41 6c 69 76 65 0d 0a 55 73   : Keep-Alive..Us
00c0  65 72 2d 41 67 65 6e 74 3a 20 53 68 6f 63 6b 77   er-Agent: Shockw
00d0  61 76 65 20 46 6c 61 73 68 0a 43 6f 6e 6e 65 63   ave Flash.Connec
00e0  74 69 6f 6e 3a 20 4b 65 65 70 2d 41 6c 69 76 65   tion: Keep-Alive
00f0  0a 43 61 63 68 65 2d 43 6f 6e 74 72 6f 6c 3a 20   .Cache-Control: 
0100  6e 6f 2d 63 61 63 68 65 0d 0a 43 6f 6e 74 65 6e   no-cache..Conten
0110  74 2d 54 79 70 65 3a 20 61 70 70 6c 69 63 61 74   t-Type: applicat
0120  69 6f 6e 2f 78 2d 66 63 73 0d 0a 43 6f 6e 74 65   ion/x-fcs..Conte
0130  6e 74 2d 4c 65 6e 67 74 68 3a 20 31 35 33 37 0d   nt-Length: 1537.
0140  0a 45 78 70 65 63 74 3a 20 31 30 30 2d 63 6f 6e   .Expect: 100-con
0150  74 69 6e 75 65 0d 0a 0d 0atinue


The response:

Hypertext Transfer Protocol
HTTP/1.0 417 Expectation failed\r\n
[Expert Info (Chat/Sequence): HTTP/1.0 417 Expectation failed\r\n]
[Message: HTTP/1.0 417 Expectation failed\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Version: HTTP/1.0
Response Code: 417
Server: squid/2.7.STABLE3\r\n
Date: Mon, 12 Apr 2010 10:16:26 GMT\r\n
Content-Type: text/html\r\n
Content-Length: 1451\r\n
[Content length: 1451]
Expires: Mon, 12 Apr 2010 10:16:26 GMT\r\n
X-Squid-Error: ERR_INVALID_REQ 0\r\n
X-Cache: MISS from muinnamuice.staff.gcd.ie\r\n
X-Cache-Lookup: NONE from muinnamuice.staff.gcd.ie:8080\r\n
Via: 1.0 muinnamuice.staff.gcd.ie:8080 (squid/2.7.STABLE3)\r\n
Connection: close\r\n
\r\n
Line-based text data: text/html
http://www.w3.org/TR/html4/loose.dtd";>\n
\n
ERROR: The requested URL could not be retrieved\n
\n
\n
ERROR\n
The requested URL could not be retrieved\n
\n
\n
While trying to process the request:\n
\n
POST /send/ep1mbRQu6HXWVYzI/1 HTTP/1.1\n
Host: 89.207.56.107\r\n
Accept: */*\r\n
Proxy-Connection: Keep-Alive\r\n
User-Agent: Shockwave Flash\r\n
Connection: Keep-Alive\r\n
Cache-Control: no-cache\r\n
Content-Type: application/x-fcs\r\n
Content-Length: 1537\r\n
Expect: 100-continue\r\n
\n
\n
\n
The following error was encountered:\n
\n
\n
\n
Invalid Request\n
\n
\n
\n
\n
Some aspect of the HTTP Request is invalid.  Possible problems:\n
\n
Missing or unknown request method\n
Missing URL\n
Missing HTTP Identifier (HTTP/1.0)\n
Request is too large\n
Content-Length missing for POST or PUT requests\n
Illegal character in hostname; underscores are not allowed\n
\n
Your cache administrator is mailto:helpd...@gcd.ie";>helpd...@gcd.ie. \n
\n
\n
\n
\n
Generated Mon, 12 Apr 2010 10:16:26 GMT by muinnamuice.staff.gcd.ie 
(squid/2.7.STABLE3)\n
\n
\n



Re: [squid-users] help with squid error

2010-04-12 Thread Gavin McCullagh
Hi Amos,

On Tue, 13 Apr 2010, Amos Jeffries wrote:

> Squid is following RFC 2616 requirements.  When HTTP/1.1 request
> containing "Expect: 100-continue" is going to pass through a
> HTTP/1.0 proxy or server which can't handle the "100" status
> messages a "417" message MUST be sent back instead.
> 
> The expected result is that the client software will retry
> immediately without the "Expect: 100-continue" conditions. Failing
> that it's probably broken software.

I see, thanks.  I've been trying to track down why I can't get the flash
player video to play.

While this failure is mixed in there, I think this may be a red herring for
the problem I'm experiencing.  Others going through the same proxy have had
success.

Thanks again for your help,
Gavin



[squid-users] squid on 32-bit system with PAE and 8GB RAM

2009-03-16 Thread Gavin McCullagh
Hi,

we're running a reasonably busy squid proxy system here which peaks at
about 130-150 requests per second.  

The OS is Ubuntu Hardy and at the minute, I'm using the packaged 2.6.18
squid version.  I'm considering a hand-compile of 2.7, though it's quite
nice to get security patches from the distro. 

We have 2x SATA disks, a 150GB and a 1TB.  The linux system is on software
RAID1 across the two disks.  The main cache is 600GB in size on a single
non-RAID 970GB partition at the end of the 1TB disk.  A smaller partition
is reserved on the other disk as a secondary cache, but that's not in use
yet and the squid logs are currently written there.  The filesystems for
the caches are reiserfs v3 and the cache format is AUFS. 

We've been monitoring the hit rates, cpu usage, etc. using munin.   We
average about 13% byte hit rate.  Iowait is now a big issue -- perhaps not
surprisingly.  I had 4GB RAM in the server and PAE turned on.  I upped this
to 8GB with the idea of expanding squid's RAM cache.  Of course, I forgot
that the squid process can't address anything like that much RAM on a
32-bit system.  I think the limit is about 3GB, right?

I have two questions.  Whenever I up the cache_mem beyond about 2GB, I
notice squid terminates with signal 6 and restarts as the cache_mem fills.
I presume this is squid hitting the 3GB-odd limit?  Could squid not behave
a little more politely in this situation -- either not attempting to
allocate the extra RAM, giving a warning or an error?

My main question is, is there a sensible way for me to use the extra RAM?
I know the OS does disk caching with it but with a 600GB cache, I doubt
that'll be much help.  I thought of creating a 3-4GB ramdisk and using it
as a volatile cache for squid which gets re-created (either by squid -z or
by dd of an fs image) each time the machine reboots.  The things is, I
don't know how squid addresses multiple caches.  If one cache is _much_
faster but smaller than the other, can squid prioritise using it for the
most regularly hit data or does it simply treat each cache as equal?  Are
there docs on these sorts of issues?

Any suggestions would be most welcome.

Gavin



Re: [squid-users] squid on 32-bit system with PAE and 8GB RAM

2009-03-16 Thread Gavin McCullagh
Hi,

thanks for the reply.

On Tue, 17 Mar 2009, Amos Jeffries wrote:

> FYI: The latest Intrepid or Jaunty package should work just as well in  
> Hardy.

I'll look into this.  I tried to build the intrepid debian package from
source, but I came across a build dependency which was apparently not
available on hardy: libgssglue-dev.  I'll look into installing the
pre-built package, but I would've thought it would need newer version of
libraries.

In general, I'm looking for simple maintenance and patching, but not at the
expense of too much performance.  Would we benefit much from a hand-built
squid install?  In what way?

>> Of course, I forgot that the squid process can't address anything like
>> that much RAM on a 32-bit system.  I think the limit is about 3GB,
>> right?
>
> For 32-bit I think it is yes. You can rebuild squid as 64-bit or check  
> the distro for a 64-bit build.

The server hardware isn't 64-bit so surely I can't run a 64-bit squid
build, can I?

> However keep this in mind:  rule-of-thumb is 10MB index per GB of cache.
>
> So your 600 GB disk cache is likely to use ~6GB of RAM for index +  
> whatever cache_mem you allocate for RAM-cache + index for RAM-cache + OS  
> and application memory.

Ouch.  That's not a rule of thumb I'd seen anywhere.  I'm really not
observing it either.  Squid runs stabley for days with a 1.7GB cache_mem
and a 600GB disk cache. 

It may help that we're allowing large objects into the cache and using
"heap lfuda".  We plot the average object size with munin and it's about
90KB.  Presumably the 10MB per 1GB is strongly a function of average object
size.  
http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie.html#Squid

The drops in RAM usage are all due to squid restarting.  As long as I keep
the cache_mem below about 1.8-2GB

>> I have two questions.  Whenever I up the cache_mem beyond about 2GB, I
>> notice squid terminates with signal 6 and restarts as the cache_mem fills.
>> I presume this is squid hitting the 3GB-odd limit?  Could squid not behave
>> a little more politely in this situation -- either not attempting to
>> allocate the extra RAM, giving a warning or an error?
>
> cache.log should contain a FATAL: message and possibly a line or two  
> beforehand about why and where the crash occured.
> Please can you post that info here.

My apologies, there is a useful error, though in syslog not cache.log.

Mar 15 22:50:24 watcher squid[6751]: httpReadReply: Excess data from "POST 
http://im.studivz.net/webx/re";
Mar 15 22:52:50 watcher squid[6748]: Squid Parent: child process 6751 exited 
due to signal 6
Mar 15 22:52:53 watcher squid[4206]: Starting Squid Cache version 2.6.STABLE18 
for i386-debian-linux-gnu...
Mar 15 22:52:53 watcher squid[4206]: Store logging disabled
Mar 15 22:52:53 watcher squid[4206]: Rebuilding storage in 
/var/spool/squid/cache2 (DIRTY)
Mar 15 22:54:29 watcher squid[4206]:262144 Entries Validated so far.
Mar 15 22:54:29 watcher squid[4206]:524288 Entries Validated so far.

I read this before and missed the "out of memory" error which appears in
the syslog:

Mar 15 22:52:50 watcher out of memory [6751]

this seems to happen every time:

Mar 10 11:58:12 watcher out of memory [22646]
Mar 10 17:52:03 watcher out of memory [24620]
Mar 11 00:57:52 watcher out of memory [31626]

>> My main question is, is there a sensible way for me to use the extra RAM?
>> I know the OS does disk caching with it but with a 600GB cache, I doubt
>> that'll be much help.
>
> RAM swapping (disk caching by the OS) is one major performance killer.  
> Squid needs direct access to all its memory for fast index searches and  
> in-transit processing.

Of course.  We definitely don't see any swapping to disk.  I watch our
munin memory graphs carefully for this.  What I mean is that the linux OS
does the opposite where RAM is unused -- it caches data in RAM, reads ahead
open files, etc. but this probably won't help much where the amount of data
on disk is very large.

http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie.html#System

>> I thought of creating a 3-4GB ramdisk and using it
>> as a volatile cache for squid which gets re-created (either by squid -z or
>> by dd of an fs image) each time the machine reboots.   The things is, I
>> don't know how squid addresses multiple caches.  If one cache is _much_
>> faster but smaller than the other, can squid prioritise using it for the
>> most regularly hit data or does it simply treat each cache as equal?  Are
>> there docs on these sorts of issues?
>
> No need that is already built into Squid. cache_mem defines the amount  
> of RAM-cache Squid uses.

Right, but if the squid process is hitting its 32-bit memory limit, I
can't increase this any more, can I?  This is why I'm suggesting a ramdisk
cache as that won't expand squid's internal memory usage.

> Squid allocates the disk space based on free space and attempts to  
> spread the load evenly over all dirs to minimize disk access/seek times.

Re: [squid-users] squid on 32-bit system with PAE and 8GB RAM

2009-03-16 Thread Gavin McCullagh
Hi,

On Tue, 17 Mar 2009, Amos Jeffries wrote:

>> The server hardware isn't 64-bit so surely I can't run a 64-bit squid
>> build, can I?
>
> Ah, no I believe thats a problem. I kind of assumed that since your  
> system could take >2GB of RAM it was 64-bit enabled hardware.

Ah well.

>> It may help that we're allowing large objects into the cache and using
>> "heap lfuda".  We plot the average object size with munin and it's about
>> 90KB.  Presumably the 10MB per 1GB is strongly a function of average object
>> size.
>   http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie.html#Squid
>
> (sites restricted, but never mind)

Sorry, that's fixed now if you want a look.

> Yes the rule-of-thumb was from past measures mediated by object size  
> (averages between 64KB and 128KB). I'm surprised you are seeing such a  
> low index size.

It is quite strange -- 90KB is bang in the middle of your range.  We're
getting about 5-10% of cache hits served from RAM too so there's definitely
cache_mem in use.

>> The drops in RAM usage are all due to squid restarting.  As long as I keep
>> the cache_mem below about 1.8-2GB
>
> Maybe the large-file changes in 2.7 will help then.

That's interesting.

>> Sorry, some of you may be scratching your heads and wondering why one would
>> do something so crazy.  I've just got 4GB RAM sitting moreorless idle,
>> a really busy disk and would like to use one to help the other :-)
>
> Aha, in that case maybe. It would be an interesting setup anyhow.

I'll give it some thought, but it sounds like there's no other easy way to
use the spare RAM.

Gavin



Re: [squid-users] squid on 32-bit system with PAE and 8GB RAM

2009-03-16 Thread Gavin McCullagh
Hi,

On Mon, 16 Mar 2009, Marcello Romani wrote:

> From my little experience I would suggest that you give squid cache_mem  
> a value of just some hundreds of MBs, and let the other GBs of ram to  
> squid for indexes and the OS for disk caching. I guess after some time  
> this will take you near a ramdisk-only setup.

Really?  I would have thought the linux kernel's disk caching would be far
less optimised for this than using a large squid cache_mem (whatever about
a ramdisk).

> Also, this would move the problem of accessing a very large ram address  
> space from squid (which being only 32-bit can lead to problems) to the  
> OS, which IMHO is better suited for this task.

It's starting to look that way alright.

> Also, I don't understand why spending so much on memory instead of  
> buying some more spindles to have a more balanced server in the end  
> (maybe space constraints ?)

The cost of 8GB of ram was about €100, so it was relatively cheap.  As you
guessed, the machine itself is 1U and doesn't have space for any more hard
drives.

Gavin



Re: [squid-users] squid on 32-bit system with PAE and 8GB RAM

2009-03-17 Thread Gavin McCullagh
Hi,

I don't mean to labour this, I'm just keen to understand better and
obviously you guys are the experts on squid.

On Mon, 16 Mar 2009, Marcello Romani wrote:

>> Really?  I would have thought the linux kernel's disk caching would be far
>> less optimised for this than using a large squid cache_mem (whatever about
>> a ramdisk).
>
> As others have pointed out, squid's cache_mem is not used to serve  
> on-disk cache objects, while os's disk cache will hold those objects in  
> RAM after squid requests them for the first time.

Agreed.  I would have thought though that a large cache_mem would be a
better way to increase the data served from RAM, compared to the OS disk
caching.  

I imagine, perhaps incorrectly, that squid uses the mem_cache first for
data, then when it's removed (by LRU or whatever), pushes it out to the
disk cache.  This sounds like it should lead to a pretty good
mem_cache:disk_cache serving ratio.  I don't have much to back this up, but
the ratio in my own case is pretty high so squid appears not to just treat
all caches (memory and disk) equally.

http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie-squid_cache_hit_breakdown.html

By comparison, I would expect linux's disk caching, which has no
understanding of the fact that this is a web proxy cache, to be less smart.
Perhaps that's incorrect though, I'm not sure what mechanism linux uses.

> So if you leave most of RAM to OS for disk cache you'll end up having  
> many on-disk object loaded from RAM, i.e. very quickly.

Some, but I would imagine not as many as with mem_cache.

> Also, squid needs memory besides cache_mem, for its own internal  
> structures and for managing the on-disk repository. If its address space  
> is already almost filled up by cache_mem alone, it might have problems  
> allocating its own memory structures.

Absolutely agreed and the crashes I've seen appear to be caused by this,
though dropping to around 1.7GB mem_cache appears to cure this.  

The question then is, which would be better, an extra cache based on a
ramdisk, or just leaving it up to the kernel's disk caching.  

> OS's disk cache, on the other hand, is not allcated from squid's process  
> memory space and has also a variable size, automatically adjusted by the  
> OS when app memory needs grow or shrink.

Right.  A ramdisk is also not allocated from squid's process space either,
but it doesn't shrink in the way linux disk caching would and that might
cause swapping in a bad situation.  That's a clear advantage for linux's
caching.  Simplicity is another clear advantage.

The question I'm left with is, which of the two would better optimise the
amount of data served from ram (thus lowering iowait), linux's caching or
the ramdisk?

I guess it's not a very normal setup, so maybe nobody has done this.

Thanks for all the feedback,
Gavin




Re: [squid-users] R: Re: [squid-users] Squid + Antivirus

2009-03-23 Thread Gavin McCullagh
Hi,

On Mon, 23 Mar 2009, projpr...@libero.it wrote:

> Amos wrote

> >I've heard people mention good stuff about squid integration with 
> >ClamAV, HAVP, NOD32.

> >> I´m really satisfy from squid, and I´d like to bring finally this project 
> >> out of the test phase.  Thanks in advance to everybody.
> >
> >Indeed I'd like to get this into the wiki ASAP.
> >
> >Is anyone able to either add a ConfigExamples page for it or publish 
> >details here?

I was looking into this recently.  Using HAVP as a parent proxy to squid
seems the way and is documented here.

http://www.server-side.de/ideas.htm

I haven't tried it yet though.

Gavin



Re: [squid-users] Squid, Symantec LiveUpdate, and HTTP 1.1 versus HTTP 1.0

2009-03-26 Thread Gavin McCullagh
Hi,

On Wed, 25 Mar 2009, Marcus Kool wrote:

> The story about Squid and HTTP 1.1 is long...
>
> To get your LiveUpdate working ASAP you might want to
> fiddle with the firewall rules and to NOT redirect
> port 80 traffic of Symantec servers to Squid, but
> simply let the traffic pass.

We're running the squid version packaged for Ubuntu Hardy
(2.6.18-1ubuntu3).  We run it in as both an explicitly configured and as a
transparent proxy.

I hadn't realised the lack of HTTP/1.1 in squid would break websites.  Are
there many such websites?

Is this only in the transparent situation or is it whenever you go through
squid?  Is there any version of squid which supports HTTP/1.1 or works
around this yet?

Gavin



Re: [squid-users] Squid, Symantec LiveUpdate, and HTTP 1.1 versus HTTP 1.0

2009-03-26 Thread Gavin McCullagh
Hi,

On Thu, 26 Mar 2009, Amos Jeffries wrote:

> A very few. Pressure is on them to fix up when they break so it's no  
> common fortunately.

Phew.  I guess if we needed to I can alter our wpad.dat and policy filter
to dictate direct access to norton updates, though I'd really rather not.
I do see this sort of error in the logs occasionally.  

Mar 26 11:17:50 proxy squid[2969]: parseHttpRequest: Invalid HTTP version   
 
Mar 26 11:17:51 proxy squid[2969]: parseHttpRequest: Invalid HTTP version   
 
Mar 26 11:17:54 proxy squid[2969]: parseHttpRequest: Invalid HTTP version 

Actually that's from a different proxy server running Debian and
2.6.5-6etch4.

> Part of the HTTP/1.1 spec requires that HTTP/1.0 visitors be accepted  
> and dealt with properly. So the sites are in violation by using the 1.1  
> moniker when they can't handle critical parts of the spec. (This is one  
> of the main reasons Squid still says 1.0).

I see.

>> Is this only in the transparent situation or is it whenever you go through
>> squid?  Is there any version of squid which supports HTTP/1.1 or works
>> around this yet?
>
> Squid-2.7 can tell servers it is 1.1, but cannot to the client-side part.

Does it help to tell the server you're using 1.1?  Will the server not then
respond using 1.1 features which squid doesn't support?

Gavin



[squid-users] caching googlevideo.com with squid

2009-03-31 Thread Gavin McCullagh
Hi,

running squid in transparent mode for our network, we're trying to
maximised our byte cache hit ratio.  Isn't everyone ;-)

I've been looking over reports generated by SRG and I guess shouldn't be
surprised to find that youtube/goodlevideo (when you add up the 250
different mirror servers) accounts for around 15% of our bandwidth usage
(c. 28GB per day in c. 300 requests) and has a 0% hit rate.  This seems
worthy of some effort.

So, I've been looking at these pages:

http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube
http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube/Discussion

I gather I'd need to upgrade to squid 2.7 (for storeurl) which I can do.
The remainder looks doable, if a little complex.

Is there a big performance hit involved in spawning the perl script?  I
guess those ACLs will get hit relatively rarely so perhaps not?  

I guess the perl script needs to be modified every time google/youtube
change subdomain names.  Do I presume this doesn't happen too often?

Does anyone else have any experience or advice on this?  What sort of hit
rate do you tend to get in practice?  There's a lot of video on
goodlevideo, but I guess certain videos tend to become highly popular for a
few days.

Many thanks in advance for any suggestions,
Gavin



[squid-users] understanding how squid disk load scales

2009-04-02 Thread Gavin McCullagh
Hi,

our squid system (according to our munin graphs), is suffering rather from
high iowait.  I'm also seeing warnings of disk i/o overloading.

I'm interested to understand how this disk load scales.  I know more disks
(we only have a single cache disk just now) would be a big help.  One
question I have is how (and if) the disk load scales with the size of the
cache.

I'll present a ludicrously simplistic description of how disk load might
scale (purely as a starting point) and see could people point out where I'm
wrong.

The job a single disk running a cache must do in some time step might be:

   disk_work = (write_cached_data) + (cache_replacement_policy) + (read_hits)

where:
   (write_cached_data) =~ x * (amount_downloaded)
   (cache_replacement_policy) = (remove_expired_data) + (LRU,LFUDA,...)
   (read_hits) =~ byte_hit_rate
   (LRU,LFUDA,...) =~ amount of space needed =~ x * (amount_downloaded)
   (remove_expired_data) =~ (amount_downloaded) over previous time

so
  disk_work = f(amount_downloaded,byte_hit_rate,cache_replacement_policy)

To me this speculative analaysis suggests that the load on the disk is a
function of the byte_hit_rate and the amount being downloaded, but not of
the absolute cache size.

So, decreasing the cache_dir size might lower the disk load, but only as it
lowers the byte_hit_rate (and possibly the seek time on the disk I guess).
Is there something wrong in this?

Gavin



Re: [squid-users] Squid Scalability

2009-04-03 Thread Gavin McCullagh
On Fri, 03 Apr 2009, Amos Jeffries wrote:

> Despite many years of asking, few people have ever supplied the squid  
> project with relevant benchmarking info. We depend on volunteers so  
> there are no hard numbers available publicly yet.

Is there a doc stating exactly what benchmarks you want and how to present
them?  I'm sure I could pull some out.

Gavin



Re: [squid-users] Squid Scalability

2009-04-05 Thread Gavin McCullagh
Hi,

On Sat, 04 Apr 2009, Amos Jeffries wrote:

> For now what we need are the hit/miss ratios and user numbers from Squid  
> under peak load, and a few other details to guide comparisons.
>
>   http://wiki.squid-cache.org/KnowledgeBase/Benchmarks
> details what we are looking for right now and where to locate it.

Here's our current situation:


Version: 2.6.STABLE18 (Ubuntu Hardy Package)
OS: 32-Bit Ubuntu GNU/Linux (Hardy)
CPU: Dual Core Intel(R) Xeon(R) CPU  3050  @ 2.13GHz
RAM: 8GB
HDD: 2x SATA disks (150GB, 1TB)
Cache: 1x 600GB
Users: ~3000
RPS: 130
Hit Ratio: 35-40%
Byte Hit Ratio: ~13%

Submitted by: Gavin McCullagh, Griffith College Dublin
With this hit ratio and cache size, substantial cpu time is spent in iowait
as the disk is overloaded.  Reducing the cache to 450GB relieves this, but
the hit rate drops to more like 10-11%.


I'm going to put a second 1TB disk in to replace the 130GB and have a
second large cache_dir so this should improve.

Gavin




Re: [squid-users] Squid Scalability

2009-04-06 Thread Gavin McCullagh
Hi,

On Mon, 06 Apr 2009, Amos Jeffries wrote:

> Thank you. Added.
> What sort of CPU load does it run under?

Very high, but the web still feels reasonably responsive in general.  The
load average peaked yesterday at 9 but this is since I reduced the cache
size.  It hit 30 last week which is when I decided the cache (or perhaps
the hit rate) was too high for the disk.

http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie.html
http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie-squid_response_time.html
http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie-load.html

> And being linux is it running AUFS cache_dir?

It is indeed. On reiserfs with noatime,nodiratime,notail and heap LFUDA.
1.7GB RAM cache.

Gavin



Re: [squid-users] Squid Scalability

2009-04-06 Thread Gavin McCullagh
On Mon, 06 Apr 2009, Amos Jeffries wrote:

> Ah, sorry I meant CPU load as reported by Squid in %:
>
> "It can be extracted from the "general runtime information" or "info"  
> cachemgr page. It's the value marked "CPU Usage" "

I'll hold off until a peak time and check.  If it's similar, "top"
generally shows relatively low cpu usage for the squid process.

The munin CPU usage graph also gives a guide (the blue spikes are SRG
running late at night).

http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie-cpu.html

> Okay, we've reached the edge of my storage-specific performance  
> knowledge. I hope someone knows a bit more and can educate us both on a  
> good fix :)

Mine too.  The operating system is on linux software RAID1 partitions so I
can swap out the smaller disk, pop in a second 1TB disk, sync the OS
partitions and use the remainder as a second large cache.  I intend doing
that in the next week or so which should hopefully allow me to up our hit
rate a bit.

I'm not certain yet if the analysis I posted before is correct but if it
is, it would appear a single SATA disk of this sort can only manage about
12-13% with our throughput (150-200GB per day, peaking around 40Mb/sec).  I
suspect if I came up with a way to improve our hit rate further (eg youtube
caching), the iowait trouble would just get worse.  Hopefully a second disk
will allow this to expand further.  After that, I guess I'll have to look
at a new server.

Gavin



Re: [squid-users] Squid Scalability

2009-04-06 Thread Gavin McCullagh
Hi,

On Tue, 07 Apr 2009, Amos Jeffries wrote:

> Gavin McCullagh wrote:

>> Mine too.  The operating system is on linux software RAID1 partitions so I
>
> Ah, there we probably have the answer as to why there is so much iowait.

I'm not convinced of that.  The iowait seems to grow directly as a function
of the cache size and the caches themselves are not RAIDed.  You can see
that I recently reduced the cache size and got an immediate, substantial
reduction in iowait.

http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie.html#Squid
http://deathcab.gcd.ie/munin/gcd.ie/watcher.gcd.ie.html#System

> You may want to find out how to determin RAID iowait vs other iowait and  
> see what shows up.

That would be interesting alright.  I'll see what I can find out.

> Though I only had 2x 250GB disks on 2.6GHz box RAID1, Squid maxed out  
> and started seriously lagging requests under 3 users (at around ~5Mbps  
> wild guess).

Did you have the cache on RAID1 or the OS or both?  Hardware or software
RAID (not that my using software should improve anything of course)?

> I shifted to a *slower* 1.8GHz box with single OS-shared disk and it now  
> hits serves 15 users without sweating and runs dozens of reverse-proxy  
> domains as a side job.

Squid's usage of CPU time doesn't seem to be an issue for us at all so I
can well believe that.

> My review of RAID + Squid was overruled by some RAID experts with more  
> experience. I'm still puzzled how they got the evidence for  
> "performance: quite good" on software RAID though, maybe dual-core  
> minimum, mine are both singles.

I can certainly see how putting the cache on software RAID1 is a bad plan,
but that's not what I've done and the iowait is sensitive to cache size.
I have the squid logs and the squid cache on single disk partitions.  I
don't think the OS shouldn't be loading the disk too much.

Many thanks for your help on this,

Gavin