Re: [squid-users] question about filesystems and directories for cache.

2007-11-27 Thread Amos Jeffries

Tony Dodd wrote:

Chris Robertson wrote:
First of all, thanks for sharing the write-up.  There are a number of 
high-load squid installations (Wikipedia, and Flikr are two of the 
largest I know of), but not much information on what tweaks to make in 
the interest of performance.
No problem. =]  I encountered the same problem when trying to figure out 
how to get more performance so I figured once I'd cracked it, the least 
I could do was document it for the other people having the same issue 
(and to give myself a reference for later).


After perusing your posting, I'm wondering if you would run a 
squidclient -p 80 mgr:info |grep method.  I'm making the assumption 
that your squid is listening on port 80, so please change the argument 
to -p if needed.  Your configuration options included --enable-poll, 
but with a 2.6 kernel and 2.6 sources, I would be surprised if you are 
not actually using epoll.  It might be a superfluous compile option.

[EMAIL PROTECTED] ~]# squidclient -p 8081 mgr:info |grep method
   IO loop method: poll


Hmm, as Adrian said, try adding --enable-epoll to your options, that 
should theoretically have a similar difference over poll as aufs has 
over ufs.


Also, since you are building from source, try the absolute latest 2.6 
around. There is an ongoing optimisation work by Adrian underway that is 
showing some noticible speed improvements across the 2.6-teens.


Cache digests are not the only method of sharing between peers.  ICP 
is an alternative and I have read that multicast works well for 
scaling beyond a handful of peers.  I can't seem to find the posting 
now that I want to reference it.  I'd trust your experience over my 
memory of someone else's posting, but I thought I would raise the 
possibility.
I was under the impression that when utilizing cache peering, it worked 
better if the squids had a digest of what was on X squid server, before 
asking for it.  I could be wrong on that though - Adrian, care to 
comment on this one?  It's now redundant in my situation though, as 
every peering mechanism is slower than going back to parent in our use 
case.


Theoretically yes. Practically ... there are incremental and cyclic 
digest methods. The former is not much better than multicast ICP. The 
later suffers from periodic minor update delays. But none have been 
adequately benchmarked in squid IFAIK.

... Side project anyone?

I'm surprised you had to specify your hosts file in your squid.conf.  
/etc/hosts is the default.
There are a couple of bugs in squid that seem to cause issues if you 
don't actually specify the hosts file within the squid conf... worst 
case, it's an extra line of config to parse on startup.


Are these bugs in bugzilla? Please add asap if not.



Lastly, I'd be wary of specifying dns_nameservers as a squid.conf 
option.  Squid will use the servers specified in /etc/resolv.conf if 
this option is not specified.  Now you have to maintain name servers 
in two locations.
Same goes here; DNS lookups were taking  200-1000ms without specifying 
dns_nameservers in the config (the nameservers specified there are the 
same ones within /etc/resolv.conf), now they're sub 1ms.  There isn't 
much chance of us re-ip-ing internally, so it's a pretty safe config 
option for us.  I definitely agree that it could cause problems for 
people using public DNS resolution though.


Hmm, glad it works for you.

I think it might have something to do with other settings in 
resolv.conf, namely 'search' and 'domain' which can result in NXDOMAIN 
results leading to several lookups. The default may also include a 
legacy host name lookup, where dns_nameservers might cause a bypass 
(although I don't have time to check the code and confirm that).


Worth a note, though. This is going into my todo pile for a later check.

Amos


Re: [squid-users] question about filesystems and directories for cache.

2007-11-26 Thread Matias Lopez Bergero
Tony Dodd wrote:
 Matias Lopez Bergero wrote:
 Hello,

 snip

 I'm being reading the wiki and the mailing list to know, which is the
 best filesystem to use, for now I have chose ext3 based on comments on
 the list, also, I have passed the nodev,nosuid,noexec,noatime flags to
 fstab in order to get a security and faster performance.

 snip

 Hi Matias,

 I'd personally recommend against ext3, and point you towards reiserfs.
 ext3 is horribly slow for many small files being read/written at the
 same time.  I'd also recommend maximizing your disk throughput, by
 splitting the raid, and having a cache-dir on each disk; though of
 course, you'll loose redundancy in the event of a disk failure.

 I wrote a howto that revolves around maximizing squid performance,
 take a look at it, you may find it helpful:
 http://blog.last.fm/2007/08/30/squid-optimization-guide

Thank you
I'll try that!

Regards,
Matías.


Re: [squid-users] question about filesystems and directories for cache.

2007-11-26 Thread Chris Robertson

Tony Dodd wrote:

Matias Lopez Bergero wrote:

Hello,


snip


I'm being reading the wiki and the mailing list to know, which is the
best filesystem to use, for now I have chose ext3 based on comments on
the list, also, I have passed the nodev,nosuid,noexec,noatime flags to
fstab in order to get a security and faster performance.


snip

Hi Matias,

I'd personally recommend against ext3, and point you towards reiserfs. 
ext3 is horribly slow for many small files being read/written at the 
same time.  I'd also recommend maximizing your disk throughput, by 
splitting the raid, and having a cache-dir on each disk; though of 
course, you'll loose redundancy in the event of a disk failure.


I wrote a howto that revolves around maximizing squid performance, 
take a look at it, you may find it helpful: 
http://blog.last.fm/2007/08/30/squid-optimization-guide




Hi Tony,

First of all, thanks for sharing the write-up.  There are a number of 
high-load squid installations (Wikipedia, and Flikr are two of the 
largest I know of), but not much information on what tweaks to make in 
the interest of performance.


After perusing your posting, I'm wondering if you would run a 
squidclient -p 80 mgr:info |grep method.  I'm making the assumption 
that your squid is listening on port 80, so please change the argument 
to -p if needed.  Your configuration options included --enable-poll, 
but with a 2.6 kernel and 2.6 sources, I would be surprised if you are 
not actually using epoll.  It might be a superfluous compile option.


Cache digests are not the only method of sharing between peers.  ICP is 
an alternative and I have read that multicast works well for scaling 
beyond a handful of peers.  I can't seem to find the posting now that I 
want to reference it.  I'd trust your experience over my memory of 
someone else's posting, but I thought I would raise the possibility.


I'm surprised you had to specify your hosts file in your squid.conf.  
/etc/hosts is the default.


Lastly, I'd be wary of specifying dns_nameservers as a squid.conf 
option.  Squid will use the servers specified in /etc/resolv.conf if 
this option is not specified.  Now you have to maintain name servers in 
two locations.


Chris


Re: [squid-users] question about filesystems and directories for cache.

2007-11-26 Thread Tony Dodd

Chris Robertson wrote:
First of all, thanks for sharing the write-up.  There are a number of 
high-load squid installations (Wikipedia, and Flikr are two of the 
largest I know of), but not much information on what tweaks to make in 
the interest of performance.
No problem. =]  I encountered the same problem when trying to figure out 
how to get more performance so I figured once I'd cracked it, the least 
I could do was document it for the other people having the same issue 
(and to give myself a reference for later).


After perusing your posting, I'm wondering if you would run a 
squidclient -p 80 mgr:info |grep method.  I'm making the assumption 
that your squid is listening on port 80, so please change the argument 
to -p if needed.  Your configuration options included --enable-poll, 
but with a 2.6 kernel and 2.6 sources, I would be surprised if you are 
not actually using epoll.  It might be a superfluous compile option.

[EMAIL PROTECTED] ~]# squidclient -p 8081 mgr:info |grep method
   IO loop method: poll
Cache digests are not the only method of sharing between peers.  ICP 
is an alternative and I have read that multicast works well for 
scaling beyond a handful of peers.  I can't seem to find the posting 
now that I want to reference it.  I'd trust your experience over my 
memory of someone else's posting, but I thought I would raise the 
possibility.
I was under the impression that when utilizing cache peering, it worked 
better if the squids had a digest of what was on X squid server, before 
asking for it.  I could be wrong on that though - Adrian, care to 
comment on this one?  It's now redundant in my situation though, as 
every peering mechanism is slower than going back to parent in our use case.
I'm surprised you had to specify your hosts file in your squid.conf.  
/etc/hosts is the default.
There are a couple of bugs in squid that seem to cause issues if you 
don't actually specify the hosts file within the squid conf... worst 
case, it's an extra line of config to parse on startup.


Lastly, I'd be wary of specifying dns_nameservers as a squid.conf 
option.  Squid will use the servers specified in /etc/resolv.conf if 
this option is not specified.  Now you have to maintain name servers 
in two locations.
Same goes here; DNS lookups were taking  200-1000ms without specifying 
dns_nameservers in the config (the nameservers specified there are the 
same ones within /etc/resolv.conf), now they're sub 1ms.  There isn't 
much chance of us re-ip-ing internally, so it's a pretty safe config 
option for us.  I definitely agree that it could cause problems for 
people using public DNS resolution though.


--
Tony Dodd, Systems Administrator

Last.fm | http://www.last.fm
Karen House 1-11 Baches Street
London N1 6DL

check out my music taste at:
http://www.last.fm/user/hawkeviper 



Re: [squid-users] question about filesystems and directories for cache.

2007-11-26 Thread Adrian Chadd
 not actually using epoll.  It might be a superfluous compile option.

 [EMAIL PROTECTED] ~]# squidclient -p 8081 mgr:info |grep method
IO loop method: poll

Try --enable-epoll and see if your caches are faster?




Adrian



Re: [squid-users] question about filesystems and directories for cache.

2007-11-24 Thread Tony Dodd

Matias Lopez Bergero wrote:

Hello,


snip


I'm being reading the wiki and the mailing list to know, which is the
best filesystem to use, for now I have chose ext3 based on comments on
the list, also, I have passed the nodev,nosuid,noexec,noatime flags to
fstab in order to get a security and faster performance.


snip

Hi Matias,

I'd personally recommend against ext3, and point you towards reiserfs. 
ext3 is horribly slow for many small files being read/written at the 
same time.  I'd also recommend maximizing your disk throughput, by 
splitting the raid, and having a cache-dir on each disk; though of 
course, you'll loose redundancy in the event of a disk failure.


I wrote a howto that revolves around maximizing squid performance, take 
a look at it, you may find it helpful: 
http://blog.last.fm/2007/08/30/squid-optimization-guide


--
Tony Dodd, Systems Administrator

Last.fm | http://www.last.fm
Karen House 1-11 Baches Street
London N1 6DL

check out my music taste at:
http://www.last.fm/user/hawkeviper


Re: [squid-users] question about filesystems and directories for cache.

2007-11-24 Thread Alexandre Correa
reiserfs 4 is much better than ext3 ...

On Nov 24, 2007 9:55 PM, Tony Dodd [EMAIL PROTECTED] wrote:
 Matias Lopez Bergero wrote:
  Hello,
 
 snip
 
  I'm being reading the wiki and the mailing list to know, which is the
  best filesystem to use, for now I have chose ext3 based on comments on
  the list, also, I have passed the nodev,nosuid,noexec,noatime flags to
  fstab in order to get a security and faster performance.
 
 snip

 Hi Matias,

 I'd personally recommend against ext3, and point you towards reiserfs.
 ext3 is horribly slow for many small files being read/written at the
 same time.  I'd also recommend maximizing your disk throughput, by
 splitting the raid, and having a cache-dir on each disk; though of
 course, you'll loose redundancy in the event of a disk failure.

 I wrote a howto that revolves around maximizing squid performance, take
 a look at it, you may find it helpful:
 http://blog.last.fm/2007/08/30/squid-optimization-guide

 --
 Tony Dodd, Systems Administrator

 Last.fm | http://www.last.fm
 Karen House 1-11 Baches Street
 London N1 6DL

 check out my music taste at:
 http://www.last.fm/user/hawkeviper




-- 

Sds.
Alexandre J. Correa
Onda Internet / OPinguim.net
http://www.ondainternet.com.br
http://www.opinguim.net


Re: [squid-users] question about filesystems and directories for cache.

2007-11-24 Thread Adrian Chadd
On Sat, Nov 24, 2007, Alexandre Correa wrote:
 reiserfs 4 is much better than ext3 ...

[citation needed]

I know reiserfs vs ext2|3 benchmarks in the past showed reiserfs did a little
better but both codebases have advanced over the last few years.
I'd love to see an actual up to date comparison.




Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -


Re: [squid-users] question about filesystems and directories for cache.

2007-11-24 Thread Tony Dodd
Quoting Adrian Chadd [EMAIL PROTECTED]:
 On Sat, Nov 24, 2007, Alexandre Correa wrote:
  reiserfs 4 is much better than ext3 ...

 [citation needed]

 I know reiserfs vs ext2|3 benchmarks in the past showed reiserfs did a little
 better but both codebases have advanced over the last few years.
 I'd love to see an actual up to date comparison.

All the benchmarking I performed while testing ext3 vs xfs vs reiserfs for squid
showed that reiserfs gave the best bang per buck for io intensive small file
operations...  That said, I too would like some definative numbers/graphs for
comparison in different settings.  Perhaps next time I rebuild one of my squid
boxes, I'll run some benchmarks and document them.

--
Tony Dodd, Systems Administrator

Last.fm | http://www.last.fm
Karen House 1-11 Baches Street
London N1 6DL

check out my music taste at:
http://www.last.fm/user/hawkeviper



-- 
Tony Dodd, Systems Administrator

Last.fm | http://www.last.fm
Karen House
1-11 Baches Street
London, N1 6DL

Check out my music taste at http://www.last.fm/user/HawkeVIPER


[squid-users] question about filesystems and directories for cache.

2007-11-23 Thread Matias Lopez Bergero
Hello,

I'm installing a new squid server (I have a couple running already), but
this is going to server as gateway for about 450 clients. I have a good
piece of hardware for it, but I have just two hard discs RAID 1
mirrored. I'll like to get the best performance of this servers, and I
think that the iowait would be the bottle neck of this setup. So, I'm
looking forward to configure the system in the most optimums way...

I'm being reading the wiki and the mailing list to know, which is the
best filesystem to use, for now I have chose ext3 based on comments on
the list, also, I have passed the nodev,nosuid,noexec,noatime flags to
fstab in order to get a security and faster performance.

I am not sure how to setup the caching directories what would be
better to have one directory for store the cache, or have more than
one... to use ufs, aufs or diskd.
For now based on comments at the wiki, I have chose to have four
directories using diskd.

I would like to know, what you guys think about this, or if you have
some comments or experience about this little tweaks to improve performance.

Any comments are welcome,

BR,
Matías