Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 12/03/2013 8:11 a.m., paulm wrote: Excusme David What are the ab paramenters that use to test agains squid ? -n for request count -c for concurrency level SMP in Squid shares a listening port so -c 1 will still test both workers. But the results are more interesting as you vary client count versus request count. For worst-case traffic scenario test with a guaranteed MISS response, for best-case test with a small HIT response. Other than that whatever you like. Using a FQDN you host yourself is polite. Amos
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
Excusme David What are the ab paramenters that use to test agains squid ? thnks, Paul -- View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/squid-3-2-0-5-smp-scaling-issues-tp3395333p4658947.html Sent from the Squid - Users mailing list archive at Nabble.com.
RE: [squid-users] squid 3.2.0.5 smp scaling issues
On Sun, 12 Jun 2011, Jenny Lee wrote: With tcp_fin_timeout set at theoretical minimum of 12 secs, we can do 5K req/s with 64K ports. Setting tcp_fin_timeout had no effect for me. Apparently there is conflicting / outdated information everywhere and I could not lower TIME_WAIT from its default of 60 secs which is hardcoded into include/net/tcp.h. But I doubt this would have any effect when you are constantly loading the machine. Making localhost to localhost connections didn't help either. I am not a network guru, so of course I am probably doing things wrong. But no matter how wrong you do stuff, they cannot escape brute-forcing :) And I have tried everything! I Can't do more than 450-470 reqs/sec even with 200K in "/proc/sys/net/netfilter/nf_conntrack_max" and "/sys/module/nf_conntrack/parameters/hashsize". This allows me bypass "CONNTRACK table full" issues, but my ports run out. Could you be kind enough to specify which OS you are using and if you are running the benches for extended periods of time? Any TCP tuning options you are doing also would be very useful. Of course, when you are back in the office. As I mentioned, we find your work on acls and workers valuable. I'm running Debian with custom built kernels. In the testing that I have done over the years, I have had tests at 6000+ connections/sec through forking proxies (that only log when they get a new connection, with connection rates calculated by the logs of the proxy so I know that they aren't using persistant or keep-alive connections) unfortunantly the machine in my lab with squid on it is unplugged right now. I can get at the machines running ab and apache remotely, so I can hopefully get logged in and give you the kernel settings in the next coupld of days (things are _extremely_ hectic through most of monday, so it'll probably be monday night or tuesday before I get a chance) David Lang
RE: [squid-users] squid 3.2.0.5 smp scaling issues
> Date: Sun, 12 Jun 2011 03:35:28 -0700 > From: da...@lang.hm > To: bodycar...@live.com > CC: squid-users@squid-cache.org > Subject: RE: [squid-users] squid 3.2.0.5 smp scaling issues > > On Sun, 12 Jun 2011, Jenny Lee wrote: > > >> Date: Sun, 12 Jun 2011 03:02:23 -0700 > >> From: da...@lang.hm > >> To: bodycar...@live.com > >> CC: squ...@treenet.co.nz; squid-users@squid-cache.org > >> Subject: RE: [squid-users] squid 3.2.0.5 smp scaling issues > >> > >> On Sun, 12 Jun 2011, Jenny Lee wrote: > >> > >>>> On 12/06/11 18:46, Jenny Lee wrote: > >>>>> > >>>>> On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: > >>>>> > >>>>> I like to know how you are able to do>13000 requests/sec. > >>>>> tcp_fin_timeout is 60 seconds default on all *NIXes and available > >>>>> ephemeral port range is 64K. > >>>>> I can't do more than 1K requests/sec even with > >>>>> tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to > >>>>> connections in TIME_WAIT. > >>>>> Any tuning options suggested for RHEL6 x64? > >>>>> Jenny > >>>>> > >>>>> I would have a concern using both those at the same time. reuse and > >>>>> recycle. Reuse a socket, but recycle it, I've seen issues when testing > >>>>> my own linux distro's with both of these settings. Right or wrong that > >>>>> was my experience. > >>>>> fin_timeout, if you have a good connection, there should be no reason > >>>>> that a system takes 60 seconds to send out a fin. Cut that in half, if > >>>>> not by 2/3's > >>>>> And what is your limitation at 1K requests/sec, load (if so look at > >>>>> I/O) Network saturation? Maybe I missed an earlier thread and I too > >>>>> would tilt my head at 13K requests sec! > >>>>> Tory > >>>>> --- > >>>>> > >>>>> > >>>>> As I mentioned, my limitation is the ephemeral ports tied up with > >>>>> TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. > >>>>> > >>>>> When you are tuning, you apply options one at a time. > >>>>> tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout > >>>>> which made no difference. > >>>>> > >>>>> Jenny > >>>>> > >>>>> > >>>>> nb: i still dont know how to do indenting/quoting with this hotmail... > >>>>> after 10 years. > >>>>> > >>>> > >>>> Couple of thing to note. > >>>> Firstly that this was an ab (apache bench) reported figure. It > >>>> calculates the software limitation based on speed of transactions done. > >>>> Not necessarily accounting for things like TIME_WAIT. Particularly if it > >>>> was extrapolated from say, 50K requests, which would not hit that OS > >>>> limit. > >>> > >>> Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. > >>> Of course if you send in 50K it would not be subject to this but I > >>> usually send couple 10+ million to simulate load at least for a while. > >>> > >>> > >>>> He also mentioned using a "local IP address". If that was on the lo > >>>> interface. It would not be subject to things like TIME_WAIT or RTT lag. > >>> > >>> When I was running my benches on loopback, I had tons of TIME_WAITS for > >>> 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." > >>> > >>> Of course, I might be doing things wrong. > >>> > >>> I am interested in what to optimize on RHEL6 OS level to achieve higher > >>> requests per second. > >>> > >>> Jenny > >> > >> I'll post my configs when I get back to the office, but one thing is that > >> if you send requests faster than they can be serviced the pending requests > >> build up until you start getting timeouts. so I have to tinker with the > >> number of requests that can be sent in parallel to keep the request rate > >> below this point. > >> > >> note
RE: [squid-users] squid 3.2.0.5 smp scaling issues
> Date: Sun, 12 Jun 2011 22:47:25 +1200 > From: squ...@treenet.co.nz > To: squid-users@squid-cache.org > Subject: Re: [squid-users] squid 3.2.0.5 smp scaling issues > > On 12/06/11 22:20, Jenny Lee wrote: > > > >> Date: Sun, 12 Jun 2011 03:02:23 -0700 > >> From: da...@lang.hm > >> To: bodycar...@live.com > >> CC: squ...@treenet.co.nz; squid-users@squid-cache.org > >> Subject: RE: [squid-users] squid 3.2.0.5 smp scaling issues > >> > >> On Sun, 12 Jun 2011, Jenny Lee wrote: > >> > >>>> On 12/06/11 18:46, Jenny Lee wrote: > >>>>> > >>>>> On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: > >>>>> > >>>>> I like to know how you are able to do>13000 requests/sec. > >>>>> tcp_fin_timeout is 60 seconds default on all *NIXes and available > >>>>> ephemeral port range is 64K. > >>>>> I can't do more than 1K requests/sec even with > >>>>> tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to > >>>>> connections in TIME_WAIT. > >>>>> Any tuning options suggested for RHEL6 x64? > >>>>> Jenny > >>>>> > >>>>> I would have a concern using both those at the same time. reuse and > >>>>> recycle. Reuse a socket, but recycle it, I've seen issues when testing > >>>>> my own linux distro's with both of these settings. Right or wrong that > >>>>> was my experience. > >>>>> fin_timeout, if you have a good connection, there should be no reason > >>>>> that a system takes 60 seconds to send out a fin. Cut that in half, if > >>>>> not by 2/3's > >>>>> And what is your limitation at 1K requests/sec, load (if so look at > >>>>> I/O) Network saturation? Maybe I missed an earlier thread and I too > >>>>> would tilt my head at 13K requests sec! > >>>>> Tory > >>>>> --- > >>>>> > >>>>> > >>>>> As I mentioned, my limitation is the ephemeral ports tied up with > >>>>> TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. > >>>>> > >>>>> When you are tuning, you apply options one at a time. > >>>>> tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout > >>>>> which made no difference. > >>>>> > >>>>> Jenny > >>>>> > >>>>> > >>>>> nb: i still dont know how to do indenting/quoting with this hotmail... > >>>>> after 10 years. > >>>>> > >>>> > >>>> Couple of thing to note. > >>>> Firstly that this was an ab (apache bench) reported figure. It > >>>> calculates the software limitation based on speed of transactions done. > >>>> Not necessarily accounting for things like TIME_WAIT. Particularly if it > >>>> was extrapolated from say, 50K requests, which would not hit that OS > >>>> limit. > >>> > >>> Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. > >>> Of course if you send in 50K it would not be subject to this but I > >>> usually send couple 10+ million to simulate load at least for a while. > >>> > >>> > >>>> He also mentioned using a "local IP address". If that was on the lo > >>>> interface. It would not be subject to things like TIME_WAIT or RTT lag. > >>> > >>> When I was running my benches on loopback, I had tons of TIME_WAITS for > >>> 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." > >>> > >>> Of course, I might be doing things wrong. > >>> > >>> I am interested in what to optimize on RHEL6 OS level to achieve higher > >>> requests per second. > >>> > >>> Jenny > >> > >> I'll post my configs when I get back to the office, but one thing is that > >> if you send requests faster than they can be serviced the pending requests > >> build up until you start getting timeouts. so I have to tinker with the > >> number of requests that can be sent in parallel to keep the request rate > >> below this point. > >> > >> note that when I removed the long
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On Sun, 12 Jun 2011, Amos Jeffries wrote: On 12/06/11 22:20, Jenny Lee wrote: On Sun, 12 Jun 2011, Jenny Lee wrote: On 12/06/11 18:46, Jenny Lee wrote: On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: I like to know how you are able to do>13000 requests/sec. tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral port range is 64K. I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to connections in TIME_WAIT. Any tuning options suggested for RHEL6 x64? Jenny I would have a concern using both those at the same time. reuse and recycle. Reuse a socket, but recycle it, I've seen issues when testing my own linux distro's with both of these settings. Right or wrong that was my experience. fin_timeout, if you have a good connection, there should be no reason that a system takes 60 seconds to send out a fin. Cut that in half, if not by 2/3's And what is your limitation at 1K requests/sec, load (if so look at I/O) Network saturation? Maybe I missed an earlier thread and I too would tilt my head at 13K requests sec! Tory --- As I mentioned, my limitation is the ephemeral ports tied up with TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. When you are tuning, you apply options one at a time. tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout which made no difference. Jenny nb: i still dont know how to do indenting/quoting with this hotmail... after 10 years. Couple of thing to note. Firstly that this was an ab (apache bench) reported figure. It calculates the software limitation based on speed of transactions done. Not necessarily accounting for things like TIME_WAIT. Particularly if it was extrapolated from say, 50K requests, which would not hit that OS limit. Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. Of course if you send in 50K it would not be subject to this but I usually send couple 10+ million to simulate load at least for a while. He also mentioned using a "local IP address". If that was on the lo interface. It would not be subject to things like TIME_WAIT or RTT lag. When I was running my benches on loopback, I had tons of TIME_WAITS for 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." Of course, I might be doing things wrong. I am interested in what to optimize on RHEL6 OS level to achieve higher requests per second. Jenny I'll post my configs when I get back to the office, but one thing is that if you send requests faster than they can be serviced the pending requests build up until you start getting timeouts. so I have to tinker with the number of requests that can be sent in parallel to keep the request rate below this point. note that when I removed the long list of ACLs I was able to get this 13K requests/sec rate going from machine A to squid on machine B to apache on machine C so it's not a localhost thing. getting up to the 13K rate on apache does require doing some tuning and tweaking of apache, stock configs that include dozens of dynamically loaded modules just can't achieve these speeds. These are also fairly beefy boxes, dual quad core opterons with 64G ram and 1G ethernet (multiple cards, but I haven't tried trunking them yet) David Lang Ok, I am assuming that persistent-connections are on. This doesn't simulate any real life scenario. What do you mean by that? it is the basic requirement for access to the major HTTP/1.1 performance features. ON is the default. some of the proxies that I've been testing don't support this (and don't support HTTP/1.1), so I am sure that my tests are not using persistant connections. using olde firewall toolkit http-gw (which forks a new process for every incoming connection and doesn't even support all HTTP/1.0 features), I've seen >4000 requests/sec. I've got systems in production that routinely top 1000 connections/sec between one source and one destination. David Lang I would like to know if anyone can do more than 500 reqs/sec with persistent connections off. Jenny Good question. Anyone? These are our collected reports: http://wiki.squid-cache.org/KnowledgeBase/Benchmarks They are all actual production networks traffic rates. The actual benchmark tests like David's have been kept out since we have no standard set to make them comparable.
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On 12/06/11 22:20, Jenny Lee wrote: Date: Sun, 12 Jun 2011 03:02:23 -0700 From: da...@lang.hm To: bodycar...@live.com CC: squ...@treenet.co.nz; squid-users@squid-cache.org Subject: RE: [squid-users] squid 3.2.0.5 smp scaling issues On Sun, 12 Jun 2011, Jenny Lee wrote: On 12/06/11 18:46, Jenny Lee wrote: On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: I like to know how you are able to do>13000 requests/sec. tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral port range is 64K. I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to connections in TIME_WAIT. Any tuning options suggested for RHEL6 x64? Jenny I would have a concern using both those at the same time. reuse and recycle. Reuse a socket, but recycle it, I've seen issues when testing my own linux distro's with both of these settings. Right or wrong that was my experience. fin_timeout, if you have a good connection, there should be no reason that a system takes 60 seconds to send out a fin. Cut that in half, if not by 2/3's And what is your limitation at 1K requests/sec, load (if so look at I/O) Network saturation? Maybe I missed an earlier thread and I too would tilt my head at 13K requests sec! Tory --- As I mentioned, my limitation is the ephemeral ports tied up with TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. When you are tuning, you apply options one at a time. tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout which made no difference. Jenny nb: i still dont know how to do indenting/quoting with this hotmail... after 10 years. Couple of thing to note. Firstly that this was an ab (apache bench) reported figure. It calculates the software limitation based on speed of transactions done. Not necessarily accounting for things like TIME_WAIT. Particularly if it was extrapolated from say, 50K requests, which would not hit that OS limit. Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. Of course if you send in 50K it would not be subject to this but I usually send couple 10+ million to simulate load at least for a while. He also mentioned using a "local IP address". If that was on the lo interface. It would not be subject to things like TIME_WAIT or RTT lag. When I was running my benches on loopback, I had tons of TIME_WAITS for 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." Of course, I might be doing things wrong. I am interested in what to optimize on RHEL6 OS level to achieve higher requests per second. Jenny I'll post my configs when I get back to the office, but one thing is that if you send requests faster than they can be serviced the pending requests build up until you start getting timeouts. so I have to tinker with the number of requests that can be sent in parallel to keep the request rate below this point. note that when I removed the long list of ACLs I was able to get this 13K requests/sec rate going from machine A to squid on machine B to apache on machine C so it's not a localhost thing. getting up to the 13K rate on apache does require doing some tuning and tweaking of apache, stock configs that include dozens of dynamically loaded modules just can't achieve these speeds. These are also fairly beefy boxes, dual quad core opterons with 64G ram and 1G ethernet (multiple cards, but I haven't tried trunking them yet) David Lang Ok, I am assuming that persistent-connections are on. This doesn't simulate any real life scenario. What do you mean by that? it is the basic requirement for access to the major HTTP/1.1 performance features. ON is the default. I would like to know if anyone can do more than 500 reqs/sec with persistent connections off. Jenny Good question. Anyone? These are our collected reports: http://wiki.squid-cache.org/KnowledgeBase/Benchmarks They are all actual production networks traffic rates. The actual benchmark tests like David's have been kept out since we have no standard set to make them comparable. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.12 Beta testers wanted for 3.2.0.8 and 3.1.12.2
RE: [squid-users] squid 3.2.0.5 smp scaling issues
On Sun, 12 Jun 2011, Jenny Lee wrote: Date: Sun, 12 Jun 2011 03:02:23 -0700 From: da...@lang.hm To: bodycar...@live.com CC: squ...@treenet.co.nz; squid-users@squid-cache.org Subject: RE: [squid-users] squid 3.2.0.5 smp scaling issues On Sun, 12 Jun 2011, Jenny Lee wrote: On 12/06/11 18:46, Jenny Lee wrote: On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: I like to know how you are able to do>13000 requests/sec. tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral port range is 64K. I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to connections in TIME_WAIT. Any tuning options suggested for RHEL6 x64? Jenny I would have a concern using both those at the same time. reuse and recycle. Reuse a socket, but recycle it, I've seen issues when testing my own linux distro's with both of these settings. Right or wrong that was my experience. fin_timeout, if you have a good connection, there should be no reason that a system takes 60 seconds to send out a fin. Cut that in half, if not by 2/3's And what is your limitation at 1K requests/sec, load (if so look at I/O) Network saturation? Maybe I missed an earlier thread and I too would tilt my head at 13K requests sec! Tory --- As I mentioned, my limitation is the ephemeral ports tied up with TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. When you are tuning, you apply options one at a time. tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout which made no difference. Jenny nb: i still dont know how to do indenting/quoting with this hotmail... after 10 years. Couple of thing to note. Firstly that this was an ab (apache bench) reported figure. It calculates the software limitation based on speed of transactions done. Not necessarily accounting for things like TIME_WAIT. Particularly if it was extrapolated from say, 50K requests, which would not hit that OS limit. Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. Of course if you send in 50K it would not be subject to this but I usually send couple 10+ million to simulate load at least for a while. He also mentioned using a "local IP address". If that was on the lo interface. It would not be subject to things like TIME_WAIT or RTT lag. When I was running my benches on loopback, I had tons of TIME_WAITS for 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." Of course, I might be doing things wrong. I am interested in what to optimize on RHEL6 OS level to achieve higher requests per second. Jenny I'll post my configs when I get back to the office, but one thing is that if you send requests faster than they can be serviced the pending requests build up until you start getting timeouts. so I have to tinker with the number of requests that can be sent in parallel to keep the request rate below this point. note that when I removed the long list of ACLs I was able to get this 13K requests/sec rate going from machine A to squid on machine B to apache on machine C so it's not a localhost thing. getting up to the 13K rate on apache does require doing some tuning and tweaking of apache, stock configs that include dozens of dynamically loaded modules just can't achieve these speeds. These are also fairly beefy boxes, dual quad core opterons with 64G ram and 1G ethernet (multiple cards, but I haven't tried trunking them yet) David Lang Ok, I am assuming that persistent-connections are on. This doesn't simulate any real life scenario. I would like to know if anyone can do more than 500 reqs/sec with persistent connections off. I'm not using persistant connections. I do this same sort of testing to validate various proxies that don't support persistant connections. I'm remembering the theoretical max of the TCP stack (from one source IP to one destination IP) as being ~16K requests/sec, but I don't have references to point to at the moment. David Lang
RE: [squid-users] squid 3.2.0.5 smp scaling issues
> Date: Sun, 12 Jun 2011 03:02:23 -0700 > From: da...@lang.hm > To: bodycar...@live.com > CC: squ...@treenet.co.nz; squid-users@squid-cache.org > Subject: RE: [squid-users] squid 3.2.0.5 smp scaling issues > > On Sun, 12 Jun 2011, Jenny Lee wrote: > > >> On 12/06/11 18:46, Jenny Lee wrote: > >>> > >>> On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: > >>> > >>> I like to know how you are able to do>13000 requests/sec. > >>> tcp_fin_timeout is 60 seconds default on all *NIXes and available > >>> ephemeral port range is 64K. > >>> I can't do more than 1K requests/sec even with > >>> tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to > >>> connections in TIME_WAIT. > >>> Any tuning options suggested for RHEL6 x64? > >>> Jenny > >>> > >>> I would have a concern using both those at the same time. reuse and > >>> recycle. Reuse a socket, but recycle it, I've seen issues when testing my > >>> own linux distro's with both of these settings. Right or wrong that was > >>> my experience. > >>> fin_timeout, if you have a good connection, there should be no reason > >>> that a system takes 60 seconds to send out a fin. Cut that in half, if > >>> not by 2/3's > >>> And what is your limitation at 1K requests/sec, load (if so look at I/O) > >>> Network saturation? Maybe I missed an earlier thread and I too would tilt > >>> my head at 13K requests sec! > >>> Tory > >>> --- > >>> > >>> > >>> As I mentioned, my limitation is the ephemeral ports tied up with > >>> TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. > >>> > >>> When you are tuning, you apply options one at a time. tw_reuse/tc_recycle > >>> were not used togeter and I had 10 sec fin_timeout which made no > >>> difference. > >>> > >>> Jenny > >>> > >>> > >>> nb: i still dont know how to do indenting/quoting with this hotmail... > >>> after 10 years. > >>> > >> > >> Couple of thing to note. > >> Firstly that this was an ab (apache bench) reported figure. It > >> calculates the software limitation based on speed of transactions done. > >> Not necessarily accounting for things like TIME_WAIT. Particularly if it > >> was extrapolated from say, 50K requests, which would not hit that OS limit. > > > > Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. > > Of course if you send in 50K it would not be subject to this but I usually > > send couple 10+ million to simulate load at least for a while. > > > > > >> He also mentioned using a "local IP address". If that was on the lo > >> interface. It would not be subject to things like TIME_WAIT or RTT lag. > > > > When I was running my benches on loopback, I had tons of TIME_WAITS for > > 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." > > > > Of course, I might be doing things wrong. > > > > I am interested in what to optimize on RHEL6 OS level to achieve higher > > requests per second. > > > > Jenny > > I'll post my configs when I get back to the office, but one thing is that > if you send requests faster than they can be serviced the pending requests > build up until you start getting timeouts. so I have to tinker with the > number of requests that can be sent in parallel to keep the request rate > below this point. > > note that when I removed the long list of ACLs I was able to get this 13K > requests/sec rate going from machine A to squid on machine B to apache on > machine C so it's not a localhost thing. > > getting up to the 13K rate on apache does require doing some tuning and > tweaking of apache, stock configs that include dozens of dynamically > loaded modules just can't achieve these speeds. These are also fairly > beefy boxes, dual quad core opterons with 64G ram and 1G ethernet > (multiple cards, but I haven't tried trunking them yet) > > David Lang Ok, I am assuming that persistent-connections are on. This doesn't simulate any real life scenario. I would like to know if anyone can do more than 500 reqs/sec with persistent connections off. Jenny
RE: [squid-users] squid 3.2.0.5 smp scaling issues
On Sun, 12 Jun 2011, Jenny Lee wrote: On 12/06/11 18:46, Jenny Lee wrote: On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: I like to know how you are able to do>13000 requests/sec. tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral port range is 64K. I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to connections in TIME_WAIT. Any tuning options suggested for RHEL6 x64? Jenny I would have a concern using both those at the same time. reuse and recycle. Reuse a socket, but recycle it, I've seen issues when testing my own linux distro's with both of these settings. Right or wrong that was my experience. fin_timeout, if you have a good connection, there should be no reason that a system takes 60 seconds to send out a fin. Cut that in half, if not by 2/3's And what is your limitation at 1K requests/sec, load (if so look at I/O) Network saturation? Maybe I missed an earlier thread and I too would tilt my head at 13K requests sec! Tory --- As I mentioned, my limitation is the ephemeral ports tied up with TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. When you are tuning, you apply options one at a time. tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout which made no difference. Jenny nb: i still dont know how to do indenting/quoting with this hotmail... after 10 years. Couple of thing to note. Firstly that this was an ab (apache bench) reported figure. It calculates the software limitation based on speed of transactions done. Not necessarily accounting for things like TIME_WAIT. Particularly if it was extrapolated from say, 50K requests, which would not hit that OS limit. Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. Of course if you send in 50K it would not be subject to this but I usually send couple 10+ million to simulate load at least for a while. He also mentioned using a "local IP address". If that was on the lo interface. It would not be subject to things like TIME_WAIT or RTT lag. When I was running my benches on loopback, I had tons of TIME_WAITS for 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." Of course, I might be doing things wrong. I am interested in what to optimize on RHEL6 OS level to achieve higher requests per second. Jenny I'll post my configs when I get back to the office, but one thing is that if you send requests faster than they can be serviced the pending requests build up until you start getting timeouts. so I have to tinker with the number of requests that can be sent in parallel to keep the request rate below this point. note that when I removed the long list of ACLs I was able to get this 13K requests/sec rate going from machine A to squid on machine B to apache on machine C so it's not a localhost thing. getting up to the 13K rate on apache does require doing some tuning and tweaking of apache, stock configs that include dozens of dynamically loaded modules just can't achieve these speeds. These are also fairly beefy boxes, dual quad core opterons with 64G ram and 1G ethernet (multiple cards, but I haven't tried trunking them yet) David Lang
RE: [squid-users] squid 3.2.0.5 smp scaling issues
> Date: Sun, 12 Jun 2011 19:54:10 +1200 > From: squ...@treenet.co.nz > To: squid-users@squid-cache.org > Subject: Re: [squid-users] squid 3.2.0.5 smp scaling issues > > On 12/06/11 18:46, Jenny Lee wrote: > > > > On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: > > > > I like to know how you are able to do>13000 requests/sec. > > tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral > > port range is 64K. > > I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle > > with ab. I get commBind errors due to connections in TIME_WAIT. > > Any tuning options suggested for RHEL6 x64? > > Jenny > > > > I would have a concern using both those at the same time. reuse and > > recycle. Reuse a socket, but recycle it, I've seen issues when testing my > > own linux distro's with both of these settings. Right or wrong that was my > > experience. > > fin_timeout, if you have a good connection, there should be no reason that > > a system takes 60 seconds to send out a fin. Cut that in half, if not by > > 2/3's > > And what is your limitation at 1K requests/sec, load (if so look at I/O) > > Network saturation? Maybe I missed an earlier thread and I too would tilt > > my head at 13K requests sec! > > Tory > > --- > > > > > > As I mentioned, my limitation is the ephemeral ports tied up with > > TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. > > > > When you are tuning, you apply options one at a time. tw_reuse/tc_recycle > > were not used togeter and I had 10 sec fin_timeout which made no difference. > > > > Jenny > > > > > > nb: i still dont know how to do indenting/quoting with this hotmail... > > after 10 years. > > > > Couple of thing to note. > Firstly that this was an ab (apache bench) reported figure. It > calculates the software limitation based on speed of transactions done. > Not necessarily accounting for things like TIME_WAIT. Particularly if it > was extrapolated from say, 50K requests, which would not hit that OS limit. Ab accounts for 200-OK responses and TIME_WAITS cause squid to issue 500. Of course if you send in 50K it would not be subject to this but I usually send couple 10+ million to simulate load at least for a while. > He also mentioned using a "local IP address". If that was on the lo > interface. It would not be subject to things like TIME_WAIT or RTT lag. When I was running my benches on loopback, I had tons of TIME_WAITS for 127.0.0.1 and squid would bail out with: "commBind: Cannot bind socket..." Of course, I might be doing things wrong. I am interested in what to optimize on RHEL6 OS level to achieve higher requests per second. Jenny
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On 12/06/11 18:46, Jenny Lee wrote: On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: I like to know how you are able to do>13000 requests/sec. tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral port range is 64K. I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to connections in TIME_WAIT. Any tuning options suggested for RHEL6 x64? Jenny I would have a concern using both those at the same time. reuse and recycle. Reuse a socket, but recycle it, I've seen issues when testing my own linux distro's with both of these settings. Right or wrong that was my experience. fin_timeout, if you have a good connection, there should be no reason that a system takes 60 seconds to send out a fin. Cut that in half, if not by 2/3's And what is your limitation at 1K requests/sec, load (if so look at I/O) Network saturation? Maybe I missed an earlier thread and I too would tilt my head at 13K requests sec! Tory --- As I mentioned, my limitation is the ephemeral ports tied up with TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. When you are tuning, you apply options one at a time. tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout which made no difference. Jenny nb: i still dont know how to do indenting/quoting with this hotmail... after 10 years. Couple of thing to note. Firstly that this was an ab (apache bench) reported figure. It calculates the software limitation based on speed of transactions done. Not necessarily accounting for things like TIME_WAIT. Particularly if it was extrapolated from say, 50K requests, which would not hit that OS limit. He also mentioned using a "local IP address". If that was on the lo interface. It would not be subject to things like TIME_WAIT or RTT lag. The test was also specific to the very long lists of non-matching regex ACL he apparently used. Once those were eliminated the test showed much faster numbers, but similar worker pattern. Overall, useful info for us regarding worker load sharing. And a bit of a warning for people writing long lists of regex ACL. But the ACL issue was not really surprising. HTH Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.12 Beta testers wanted for 3.2.0.8 and 3.1.12.2
RE: [squid-users] squid 3.2.0.5 smp scaling issues
On Sat, Jun 11, 2011 at 9:40 PM, Jenny Lee wrote: I like to know how you are able to do >13000 requests/sec. tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral port range is 64K. I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to connections in TIME_WAIT. Any tuning options suggested for RHEL6 x64? Jenny I would have a concern using both those at the same time. reuse and recycle. Reuse a socket, but recycle it, I've seen issues when testing my own linux distro's with both of these settings. Right or wrong that was my experience. fin_timeout, if you have a good connection, there should be no reason that a system takes 60 seconds to send out a fin. Cut that in half, if not by 2/3's And what is your limitation at 1K requests/sec, load (if so look at I/O) Network saturation? Maybe I missed an earlier thread and I too would tilt my head at 13K requests sec! Tory --- As I mentioned, my limitation is the ephemeral ports tied up with TIME_WAIT. TIME_WAIT issue is a known factor when you are doing testing. When you are tuning, you apply options one at a time. tw_reuse/tc_recycle were not used togeter and I had 10 sec fin_timeout which made no difference. Jenny nb: i still dont know how to do indenting/quoting with this hotmail... after 10 years.
[squid-users] squid 3.2.0.5 smp scaling issues
I like to know how you are able to do >13000 requests/sec. tcp_fin_timeout is 60 seconds default on all *NIXes and available ephemeral port range is 64K. I can't do more than 1K requests/sec even with tcp_tw_reuse/tcp_tw_recycle with ab. I get commBind errors due to connections in TIME_WAIT. Any tuning options suggested for RHEL6 x64? Jenny --- test setup box A running apache and ab test against local IP address >13000 requests/sec box B running squid, 8 2.3 GHz Opteron cores with 16G ram non acl/cache-peer related lines in the config are (including typos from me manually entering this) http_port 8000 icp_port 0 visible_hostname gromit1 cache_effective_user proxy cache_effective_group proxy appaend_domain .invalid.server.name pid_filename /var/run/squid.pid cache_dir null /tmp client_db off cache_access_log syslog squid cache_log /var/log/squid/cache.log cache_store_log none coredump_dir none no_cache deny all results when requesting short html page squid 3.0.STABLE12 4200 requests/sec squid 3.1.11 2100 requests/sec squid 3.2.0.5 1 worker 1400 requests/sec squid 3.2.0.5 2 workers 2100 requests/sec squid 3.2.0.5 3 workers 2500 requests/sec squid 3.2.0.5 4 workers 2900 requests/sec squid 3.2.0.5 5 workers 2900 requests/sec squid 3.2.0.5 6 workers 2500 requests/sec squid 3.2.0.5 7 workers 2000 requests/sec squid 3.2.0.5 8 workers 1900 requests/sec in all these tests the squid process was using 100% of the cpu I tried it pulling a large file (100K instead of <50 bytes) on the thought that this may be bottlenecking on accepting the connections but with something that took more time to service the connections it could do better however what I found is that with 8 workers all 8 were using <50% of the CPU at 1000 requests/sec local machine would do 7000 requests/sec to itself 1 worker 500 requests/sec 2 workers 957 requests/sec from there it remained about 1000 requests/sec with the cpu utilization slowly dropping off (but not dropping as fast as it should with the number of cores available) so it looks like there is some significant bottleneck in version 3.2 that makes the SMP support fairly ineffective. in reading the wiki page at wili.squid-cache.org/Features/SmpScale I see you worrying about fairness between workers. If you have put in code to try and ensure fairness, you may want to remove it and see what happens to performance. what you are describing on that page in terms of fairness is what I would expect form a 'first-come-first-served' approach to multiple processes grabbing new connections. The worker that last ran is hot in the cache and so has an 'unfair' advantage in noticing and processing the new request, but as that worker gets busier, it will be spending more time servicing the request and the other processes will get more of a chance to grab the new connection, so it will appear unfair under light load, but become more fair under heavy load. David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Wed, 4 May 2011 16:36:08 -0700 (PDT), da...@lang.hm wrote: On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 12:49 PM, da...@lang.hm wrote: IMHO, you can maximize your chances of getting free help by isolating the problem better. For example, perhaps you can try to reproduce it with different kinds of fast ACLs (the simpler the better!). This will help clarify whether the problem is specific to IPv6, IP, or ACLs in general. Test different number of ACLs: Does the problem happen only when there number of simple ACLs is huge? Make the problem easier to reproduce by posting configuration files (including Polygraph workloads or options for some other benchmarking tool you use). - This is not a guarantee that somebody will jump and help you, but fixing a well-triaged issue is often much easier. that's why I'm speaking up. I just have not known what to test. are there other types of ACLs that I should be testing? We can't answer that without having seen your config file and which are in use now. The list of all available ACL are at http://wiki.squid-cache.org/SquidFaq/SquidAcl and http://www.squid-cache.org/Doc/config/acl/ I'll setup some tests with differnet numbers of ACLs. since I've already verified that the number of ACLs defined isn't the significant factor, only the number tested before one succeds (by moving the ACL that allows my access from the end of the file to the beginning of the file, keeping everything else the same), I'll see if the slowdown seems proportional to the number of rules, or if there is something else going on. any other types of testing I should do? The above looks like a good benchmark *provided* all the ACLs have the same type with consistent content counts. Mixing types makes the result non-comparable with other tests. If you have time (and want to), we kind of need that type of benchmarking done for each ACL type. Prioritising by popularity: src/dst by IP, port, domain and regex variants. Then proxy_auth, external (the "fake" helpers can help here). Then the others; ie browser, proto, method, header matching. We know general fuzzy details like, for example, a port test is faster than a domain test. One with details presented up front by the client is also faster than one where a lookup is needed. But have no deeper info to say if a dstdomain test is faster or slower than a src (IP) test. Way down my TODO list is the dream of micro-benchmarking the ACLs in their unit-tests. Amos
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 12:49 PM, da...@lang.hm wrote: I don't know how many developers are working on squid, so I don't knwo if you are the only person who can do this sort of work or not. I am sure there are others who can do this. The question is whether you can quickly find somebody interested enough to spend their time on your problem. In general, folks work on issues that are important to them or to their customers. Most active developers donate a lot of free time, but it still tends to revolve around issues they care about for one reason or another. We all have to prioritize. I do understand this. do you think that I should join the squid-dev list? I believe your messages are posted to squid-dev so you are not going to reach a wider audience if you do. If you want to write Squid code, joining is a good idea! I don't really have the time to do coding on this project IMHO, you can maximize your chances of getting free help by isolating the problem better. For example, perhaps you can try to reproduce it with different kinds of fast ACLs (the simpler the better!). This will help clarify whether the problem is specific to IPv6, IP, or ACLs in general. Test different number of ACLs: Does the problem happen only when there number of simple ACLs is huge? Make the problem easier to reproduce by posting configuration files (including Polygraph workloads or options for some other benchmarking tool you use). This is not a guarantee that somebody will jump and help you, but fixing a well-triaged issue is often much easier. that's why I'm speaking up. I just have not known what to test. are there other types of ACLs that I should be testing? I'll setup some tests with differnet numbers of ACLs. since I've already verified that the number of ACLs defined isn't the significant factor, only the number tested before one succeds (by moving the ACL that allows my access from the end of the file to the beginning of the file, keeping everything else the same), I'll see if the slowdown seems proportional to the number of rules, or if there is something else going on. any other types of testing I should do? David Lang HTH, Alex. On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 11:41 AM, da...@lang.hm wrote: anything new on this issue? (including any patches for me to test?) If you mean the "ACLs do not scale well" issue, then I do not have any free cycles to work on it right now. I was happy to clarify the new SMP architecture and suggest ways to triage the issue further. Let's hope somebody else can volunteer to do the required legwork. Alex. On Mon, 25 Apr 2011, da...@lang.hm wrote: Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) From: da...@lang.hm To: Alex Rousskov Cc: Marcos , squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is th
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Wed, 4 May 2011 11:49:01 -0700 (PDT), da...@lang.hm wrote: I don't know how many developers are working on squid, so I don't knwo if you are the only person who can do this sort of work or not. 4 part-timers and a few others focused on specific areas. do you think that I should join the squid-dev list? I thought you had, if you are intending to follow this for long it could be a good idea anyway. If you have any time to spare on tinkering with optimizations even better. Amos
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 05/04/2011 12:49 PM, da...@lang.hm wrote: > I don't know how many developers are working on squid, so I don't knwo > if you are the only person who can do this sort of work or not. I am sure there are others who can do this. The question is whether you can quickly find somebody interested enough to spend their time on your problem. In general, folks work on issues that are important to them or to their customers. Most active developers donate a lot of free time, but it still tends to revolve around issues they care about for one reason or another. We all have to prioritize. > do you think that I should join the squid-dev list? I believe your messages are posted to squid-dev so you are not going to reach a wider audience if you do. If you want to write Squid code, joining is a good idea! IMHO, you can maximize your chances of getting free help by isolating the problem better. For example, perhaps you can try to reproduce it with different kinds of fast ACLs (the simpler the better!). This will help clarify whether the problem is specific to IPv6, IP, or ACLs in general. Test different number of ACLs: Does the problem happen only when there number of simple ACLs is huge? Make the problem easier to reproduce by posting configuration files (including Polygraph workloads or options for some other benchmarking tool you use). This is not a guarantee that somebody will jump and help you, but fixing a well-triaged issue is often much easier. HTH, Alex. > On Wed, 4 May 2011, Alex Rousskov wrote: > >> On 05/04/2011 11:41 AM, da...@lang.hm wrote: >> >>> anything new on this issue? (including any patches for me to test?) >> >> If you mean the "ACLs do not scale well" issue, then I do not have any >> free cycles to work on it right now. I was happy to clarify the new SMP >> architecture and suggest ways to triage the issue further. Let's hope >> somebody else can volunteer to do the required legwork. >> >> Alex. >> >> >>> On Mon, 25 Apr 2011, da...@lang.hm wrote: >>> >>>> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) >>>> From: da...@lang.hm >>>> To: Alex Rousskov >>>> Cc: Marcos , squid-users@squid-cache.org, >>>> squid-...@squid-cache.org >>>> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues >>>> >>>> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>>> >>>>> On 04/25/2011 05:31 PM, da...@lang.hm wrote: >>>>>> On Mon, 25 Apr 2011, da...@lang.hm wrote: >>>>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>>>>>>> On 04/14/2011 09:06 PM, da...@lang.hm wrote: >>>>>>>> >>>>>>>>> In addition, there seems to be some sort of locking betwen the >>>>>>>>> multiple >>>>>>>>> worker processes in 3.2 when checking the ACLs >>>>>>>> >>>>>>>> There are pretty much no locks in the current official SMP code. >>>>>>>> This >>>>>>>> will change as we start adding shared caches in a week or so, but >>>>>>>> even >>>>>>>> then the ACLs will remain lock-free. There could be some internal >>>>>>>> locking in the 3rd-party libraries used by ACLs (regex and such), >>>>>>>> but I >>>>>>>> do not know much about them. >>>>>>> >>>>>>> what are the 3rd party libraries that I would be using? >>>>> >>>>> See "ldd squid". Here is a sample based on a randomly picked Squid: >>>>> >>>>>libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol >>>>> >>>>> Please note that I am not saying that any of these have problems in >>>>> SMP >>>>> environment. I am only saying that Squid itself does not lock anything >>>>> runtime so if our suspect is SMP-related locks, they would have to >>>>> reside elsewhere. The other possibility is that we should suspect >>>>> something else, of course. IMHO, it is more likely to be something >>>>> else: >>>>> after all, Squid does not use threads, where such problems are >>>>> expected. >>>> >>>> >>>>> BTW, do you see more-or-less even load across CPU cores? If not, >>>>> you may >>>>> need a patch that we find useful on older Linux kernels. It is >>>>> discussed >>>>> in the "Will similar
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
I don't know how many developers are working on squid, so I don't knwo if you are the only person who can do this sort of work or not. do you think that I should join the squid-dev list? David Lang On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 11:41 AM, da...@lang.hm wrote: anything new on this issue? (including any patches for me to test?) If you mean the "ACLs do not scale well" issue, then I do not have any free cycles to work on it right now. I was happy to clarify the new SMP architecture and suggest ways to triage the issue further. Let's hope somebody else can volunteer to do the required legwork. Alex. On Mon, 25 Apr 2011, da...@lang.hm wrote: Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) From: da...@lang.hm To: Alex Rousskov Cc: Marcos , squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. they don't, everything in my test environment is by IP. But I've seen other software that still runs everything through name lookups, even if what's presented to the software (both in what's requested and in the ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if it's a name it gets resolved, if it's an IP, the IP comes back as-is, and it works for IPv4 and IPv6, no need to have logic that looks at the value and tries to figure out if the user intended to type a name or an IP). I don't know how squid is working internally (it's a pretty large codebase, and I haven't tried to really dive into it) so I don't know if squid does this or not. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process, so I was thinking that it was possible that if each ACL IP address is being put through a single DNS process, I could be running into contention on that process (and having to do name lookups for both IPv6 and then falling back to IPv4 would explain the severe performance hit far more than the difference between IPs being 128 bit values instead of 32 bit values) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 05/04/2011 11:41 AM, da...@lang.hm wrote: > anything new on this issue? (including any patches for me to test?) If you mean the "ACLs do not scale well" issue, then I do not have any free cycles to work on it right now. I was happy to clarify the new SMP architecture and suggest ways to triage the issue further. Let's hope somebody else can volunteer to do the required legwork. Alex. > On Mon, 25 Apr 2011, da...@lang.hm wrote: > >> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) >> From: da...@lang.hm >> To: Alex Rousskov >> Cc: Marcos , squid-users@squid-cache.org, >> squid-...@squid-cache.org >> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues >> >> On Mon, 25 Apr 2011, Alex Rousskov wrote: >> >>> On 04/25/2011 05:31 PM, da...@lang.hm wrote: >>>> On Mon, 25 Apr 2011, da...@lang.hm wrote: >>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>>>>> On 04/14/2011 09:06 PM, da...@lang.hm wrote: >>>>>> >>>>>>> In addition, there seems to be some sort of locking betwen the >>>>>>> multiple >>>>>>> worker processes in 3.2 when checking the ACLs >>>>>> >>>>>> There are pretty much no locks in the current official SMP code. This >>>>>> will change as we start adding shared caches in a week or so, but >>>>>> even >>>>>> then the ACLs will remain lock-free. There could be some internal >>>>>> locking in the 3rd-party libraries used by ACLs (regex and such), >>>>>> but I >>>>>> do not know much about them. >>>>> >>>>> what are the 3rd party libraries that I would be using? >>> >>> See "ldd squid". Here is a sample based on a randomly picked Squid: >>> >>>libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol >>> >>> Please note that I am not saying that any of these have problems in SMP >>> environment. I am only saying that Squid itself does not lock anything >>> runtime so if our suspect is SMP-related locks, they would have to >>> reside elsewhere. The other possibility is that we should suspect >>> something else, of course. IMHO, it is more likely to be something else: >>> after all, Squid does not use threads, where such problems are expected. >> >> >>> BTW, do you see more-or-less even load across CPU cores? If not, you may >>> need a patch that we find useful on older Linux kernels. It is discussed >>> in the "Will similar workers receive similar amount of work?" section of >>> http://wiki.squid-cache.org/Features/SmpScale >> >> the load is pretty even across all workers. >> >> with the problems descripted on that page, I would expect uneven >> utilization at low loads, but at high loads (with the workers busy >> serviceing requests rather than waiting for new connections), I would >> expect the work to even out (and the types of hacks described in that >> section to end up costing performance, but not in a way that would >> scale with the ACL processing load) >> >>>> one thought I had is that this could be locking on name lookups. how >>>> hard would it be to create a quick patch that would bypass the name >>>> lookups entirely and only do the lookups by IP. >>> >>> I did not realize your ACLs use DNS lookups. Squid internal DNS code >>> does not have any runtime SMP locks. However, the presence of DNS >>> lookups increases the number of suspects. >> >> they don't, everything in my test environment is by IP. But I've seen >> other software that still runs everything through name lookups, even >> if what's presented to the software (both in what's requested and in >> the ACLs) is all done by IPs. It's a easy way to bullet-proof the >> input (if it's a name it gets resolved, if it's an IP, the IP comes >> back as-is, and it works for IPv4 and IPv6, no need to have logic that >> looks at the value and tries to figure out if the user intended to >> type a name or an IP). I don't know how squid is working internally >> (it's a pretty large codebase, and I haven't tried to really dive into >> it) so I don't know if squid does this or not. >> >>> A patch you propose does not sound difficult to me, but since I cannot >>> contribute such a patch soon, it is probably better to test with ACLs >>> that do not require any DNS lookups instead. >>> &
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
ping, anything new on this issue? (including any patches for me to test?) David Lang On Mon, 25 Apr 2011, da...@lang.hm wrote: Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) From: da...@lang.hm To: Alex Rousskov Cc: Marcos , squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. they don't, everything in my test environment is by IP. But I've seen other software that still runs everything through name lookups, even if what's presented to the software (both in what's requested and in the ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if it's a name it gets resolved, if it's an IP, the IP comes back as-is, and it works for IPv4 and IPv6, no need to have logic that looks at the value and tries to figure out if the user intended to type a name or an IP). I don't know how squid is working internally (it's a pretty large codebase, and I haven't tried to really dive into it) so I don't know if squid does this or not. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process, so I was thinking that it was possible that if each ACL IP address is being put through a single DNS process, I could be running into contention on that process (and having to do name lookups for both IPv6 and then falling back to IPv4 would explain the severe performance hit far more than the difference between IPs being 128 bit values instead of 32 bit values) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 06:14 PM, da...@lang.hm wrote: if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process I would like to fix that documentation, but I cannot find what phrase led you to the above conclusion. The SmpScale wiki page says: Currently, Squid workers do not share and do not synchronize other resources or services, including: * DNS caches (ipcache and fqdncache); So that seems to be correct and clear. Which documentation are you referring to? ahh, I missed that, I was going by the description of the config options that configure and disable the DNS cache (they don't say anything about the SMP mode, but I read them to imply that the squid-internal DNS cache was a separate thread/proccess) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 04/25/2011 06:14 PM, da...@lang.hm wrote: >>> if that regains the speed and/or scalability it would point fingers >>> fairly conclusively at the DNS components. >>> >>> this is the only think that I can think of that should be shared between >>> multiple workers processing ACLs >> >> but it is _not_ currently shared from Squid point of view. > > Ok, I was assuming from the description of things that there would be > one DNS process that all the workers would be accessing. from the way > it's described in the documentation it sounds as if it's already a > separate process I would like to fix that documentation, but I cannot find what phrase led you to the above conclusion. The SmpScale wiki page says: > Currently, Squid workers do not share and do not synchronize other > resources or services, including: > > * DNS caches (ipcache and fqdncache); So that seems to be correct and clear. Which documentation are you referring to? Thank you, Alex.
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. they don't, everything in my test environment is by IP. But I've seen other software that still runs everything through name lookups, even if what's presented to the software (both in what's requested and in the ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if it's a name it gets resolved, if it's an IP, the IP comes back as-is, and it works for IPv4 and IPv6, no need to have logic that looks at the value and tries to figure out if the user intended to type a name or an IP). I don't know how squid is working internally (it's a pretty large codebase, and I haven't tried to really dive into it) so I don't know if squid does this or not. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process, so I was thinking that it was possible that if each ACL IP address is being put through a single DNS process, I could be running into contention on that process (and having to do name lookups for both IPv6 and then falling back to IPv4 would explain the severe performance hit far more than the difference between IPs being 128 bit values instead of 32 bit values) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 04/25/2011 05:31 PM, da...@lang.hm wrote: > On Mon, 25 Apr 2011, da...@lang.hm wrote: >> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>> On 04/14/2011 09:06 PM, da...@lang.hm wrote: >>> In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs >>> >>> There are pretty much no locks in the current official SMP code. This >>> will change as we start adding shared caches in a week or so, but even >>> then the ACLs will remain lock-free. There could be some internal >>> locking in the 3rd-party libraries used by ACLs (regex and such), but I >>> do not know much about them. >> >> what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale > one thought I had is that this could be locking on name lookups. how > hard would it be to create a quick patch that would bypass the name > lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. > if that regains the speed and/or scalability it would point fingers > fairly conclusively at the DNS components. > > this is the only think that I can think of that should be shared between > multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Cheers, Alex.
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: Ok, I finally got a chance to test 2.7STABLE9 it performs about the same as squid 3.0, possibly a little better. with my somewhat stripped down config (smaller regex patterns, replacing CIDR blocks and names that would need to be looked up in /etc/hosts with individual IP addresses) 2.7 gives ~4800 requests/sec 3.0 gives ~4600 requests/sec 3.2.0.6 with 1 worker gives ~1300 requests/sec 3.2.0.6 with 5 workers gives ~2800 requests/sec Glad you did not see a significant regression between v2.7 and v3.0. We have heard rather different stories. Every environment is different, and many lab tests are misguided, of course, but it is still good to hear positive reports. The difference between v3.2 and v3.0 is known and have been discussed on squid-dev. A few specific culprits are also known, but more need to be identified. We are working on identifying these performance bugs and reducing that difference. let me know if there are any tests that I can run that will help you. As for 1 versus 5 worker difference, it seems to be specific to your environment (as discussed below). the numbers for 3.0 are slightly better than what I was getting with the full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the last round of tests (with either the full or simplified ruleset) so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the ability to use multiple worker processes in 3.2 doesn't make up for this. the time taken seems to almost all be in the ACL avaluation as eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. If ACLs are the major culprit in your environment, then this is most likely not a problem in Squid source code. AFAIK, there are no locks or other synchronization primitives/overheads when it comes to Squid ACLs. The solution may lie in optimizing some 3rd-party libraries (used by ACLs) or in optimizing how they are used by Squid, depending on what ACLs you use. As far as Squid-specific code is concerned, you should see nearly linear ACL scale with the number of workers. given that my ACLs are IP/port matches or regex matches (and I've tested replacing the regex matches with IP matches with no significant change in performance), what components would be used. one theory is that even though I have IPv6 disabled on this build, the added space and more expensive checks needed to compare IPv6 addresses instead of IPv4 addresses accounts for the single worker drop of ~66%. that seems rather expensive, even though there are 293 http_access lines (and one of them uses external file contents in it's acls, so it's a total of ~2400 source/destination pairs, however due to the ability to shortcut the comparison the number of tests that need to be done should be <400) Yes, IPv6 is one of the known major performance regression culprits, but IPv6 ACLs should still scale linearly with the number of workers, AFAICT. Please note that I am not an ACL expert. I am just talking from the overall Squid SMP design point of view and from our testing/deployment experience point of view. that makes sense and is what I would have expected, but in my case (lots of ACLs) I am seeing a definante problem with more workers not completing more work, and beyond about 5 workers I am seeing the total work being completed drop. I can't think of any reason besides locking that this may be the case. In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? David Lang HTH, Alex. On Wed, 13 Apr 2011, Marcos wrote: Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test
Re: Res: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, Marcos wrote: thanks for your answer David. i'm seeing too much feature been included at squid 3.x, but it's getting as slower as new features are added. that's unfortunantly fairly normal. i think squid 3.2 with 1 worker should be as fast as 2.7, but it's getting slower e hungry. that's one major problem, but the fact that the ACL matching isn't scaling with more workers I think is what's killing us. 1 3.2 worker is ~1/3 the speed of 2.7, but with the easy availablity of 8+ real cores (not hyperthreaded 'fake' cores), you should still be able to get ~3x the performance of 2.7 by using 3.2. unfortunantly that's not what's happening, and we end up topping out around 1/2-2/3 the performance of 2.7 David Lang Marcos - Mensagem original De: "da...@lang.hm" Para: Marcos Cc: Amos Jeffries ; squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: Sexta-feira, 22 de Abril de 2011 15:10:44 Assunto: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues ping, I haven't seen a response to this additional information that I sent out last week. squid 3.1 and 3.2 are a significant regression in performance from squid 2.7 or 3.0 David Lang On Thu, 14 Apr 2011, da...@lang.hm wrote: Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues Ok, I finally got a chance to test 2.7STABLE9 it performs about the same as squid 3.0, possibly a little better. with my somewhat stripped down config (smaller regex patterns, replacing CIDR blocks and names that would need to be looked up in /etc/hosts with individual IP addresses) 2.7 gives ~4800 requests/sec 3.0 gives ~4600 requests/sec 3.2.0.6 with 1 worker gives ~1300 requests/sec 3.2.0.6 with 5 workers gives ~2800 requests/sec the numbers for 3.0 are slightly better than what I was getting with the full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the last round of tests (with either the full or simplified ruleset) so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the ability to use multiple worker processes in 3.2 doesn't make up for this. the time taken seems to almost all be in the ACL avaluation as eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. one theory is that even though I have IPv6 disabled on this build, the added space and more expensive checks needed to compare IPv6 addresses instead of IPv4 addresses accounts for the single worker drop of ~66%. that seems rather expensive, even though there are 293 http_access lines (and one of them uses external file contents in it's acls, so it's a total of ~2400 source/destination pairs, however due to the ability to shortcut the comparison the number of tests that need to be done should be <400) In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs as the test with almost no ACLs scales close to 100% per worker while with the ACLs it scales much more slowly, and above 4-5 workers actually drops off dramatically (to the point where with 8 workers the throughput is down to about what you get with 1-2 workers) I don't see any conceptual reason why the ACL checks of the different worker threads should impact each other in any way, let alone in a way that limits scalability to ~4 workers before adding more workers is a net loss. David Lang On Wed, 13 Apr 2011, Marcos wrote: Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I th
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 04/14/2011 09:06 PM, da...@lang.hm wrote: > Ok, I finally got a chance to test 2.7STABLE9 > > it performs about the same as squid 3.0, possibly a little better. > > with my somewhat stripped down config (smaller regex patterns, replacing > CIDR blocks and names that would need to be looked up in /etc/hosts with > individual IP addresses) > > 2.7 gives ~4800 requests/sec > 3.0 gives ~4600 requests/sec > 3.2.0.6 with 1 worker gives ~1300 requests/sec > 3.2.0.6 with 5 workers gives ~2800 requests/sec Glad you did not see a significant regression between v2.7 and v3.0. We have heard rather different stories. Every environment is different, and many lab tests are misguided, of course, but it is still good to hear positive reports. The difference between v3.2 and v3.0 is known and have been discussed on squid-dev. A few specific culprits are also known, but more need to be identified. We are working on identifying these performance bugs and reducing that difference. As for 1 versus 5 worker difference, it seems to be specific to your environment (as discussed below). > the numbers for 3.0 are slightly better than what I was getting with the > full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I > got from the last round of tests (with either the full or simplified > ruleset) > > so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and > the ability to use multiple worker processes in 3.2 doesn't make up for > this. > > the time taken seems to almost all be in the ACL avaluation as > eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. If ACLs are the major culprit in your environment, then this is most likely not a problem in Squid source code. AFAIK, there are no locks or other synchronization primitives/overheads when it comes to Squid ACLs. The solution may lie in optimizing some 3rd-party libraries (used by ACLs) or in optimizing how they are used by Squid, depending on what ACLs you use. As far as Squid-specific code is concerned, you should see nearly linear ACL scale with the number of workers. > one theory is that even though I have IPv6 disabled on this build, the > added space and more expensive checks needed to compare IPv6 addresses > instead of IPv4 addresses accounts for the single worker drop of ~66%. > that seems rather expensive, even though there are 293 http_access lines > (and one of them uses external file contents in it's acls, so it's a > total of ~2400 source/destination pairs, however due to the ability to > shortcut the comparison the number of tests that need to be done should > be <400) Yes, IPv6 is one of the known major performance regression culprits, but IPv6 ACLs should still scale linearly with the number of workers, AFAICT. Please note that I am not an ACL expert. I am just talking from the overall Squid SMP design point of view and from our testing/deployment experience point of view. > In addition, there seems to be some sort of locking betwen the multiple > worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. HTH, Alex. >> On Wed, 13 Apr 2011, Marcos wrote: >> >>> Hi David, >>> >>> could you run and publish your benchmark with squid 2.7 ??? >>> i'd like to know if is there any regression between 2.7 and 3.x series. >>> >>> thanks. >>> >>> Marcos >>> >>> >>> ----- Mensagem original >>> De: "da...@lang.hm" >>> Para: Amos Jeffries >>> Cc: squid-users@squid-cache.org; squid-...@squid-cache.org >>> Enviadas: S?bado, 9 de Abril de 2011 12:56:12 >>> Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues >>> >>> On Sat, 9 Apr 2011, Amos Jeffries wrote: >>> >>>> On 09/04/11 14:27, da...@lang.hm wrote: >>>>> A couple more things about the ACLs used in my test >>>>> >>>>> all of them are allow ACLs (no deny rules to worry about precidence >>>>> of) >>>>> except for a deny-all at the bottom >>>>> >>>>> the ACL line that permits the test source to the test destination has >>>>> zero overlap with the rest of the rules >>>>> >>>>> every rule has an IP based restriction (even the ones with >>>>> url_regex are >>>>> source -> URL regex) >>>>> >>>>> I moved the ACL that
Res: Res: [squid-users] squid 3.2.0.5 smp scaling issues
thanks for your answer David. i'm seeing too much feature been included at squid 3.x, but it's getting as slower as new features are added. i think squid 3.2 with 1 worker should be as fast as 2.7, but it's getting slower e hungry. Marcos - Mensagem original De: "da...@lang.hm" Para: Marcos Cc: Amos Jeffries ; squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: Sexta-feira, 22 de Abril de 2011 15:10:44 Assunto: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues ping, I haven't seen a response to this additional information that I sent out last week. squid 3.1 and 3.2 are a significant regression in performance from squid 2.7 or 3.0 David Lang On Thu, 14 Apr 2011, da...@lang.hm wrote: > Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues > > Ok, I finally got a chance to test 2.7STABLE9 > > it performs about the same as squid 3.0, possibly a little better. > > with my somewhat stripped down config (smaller regex patterns, replacing CIDR >blocks and names that would need to be looked up in /etc/hosts with individual >IP addresses) > > 2.7 gives ~4800 requests/sec > 3.0 gives ~4600 requests/sec > 3.2.0.6 with 1 worker gives ~1300 requests/sec > 3.2.0.6 with 5 workers gives ~2800 requests/sec > > the numbers for 3.0 are slightly better than what I was getting with the full >ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from >the >last round of tests (with either the full or simplified ruleset) > > so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the >ability to use multiple worker processes in 3.2 doesn't make up for this. > > the time taken seems to almost all be in the ACL avaluation as eliminating > all >the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. > > one theory is that even though I have IPv6 disabled on this build, the added >space and more expensive checks needed to compare IPv6 addresses instead of >IPv4 >addresses accounts for the single worker drop of ~66%. that seems rather >expensive, even though there are 293 http_access lines (and one of them uses >external file contents in it's acls, so it's a total of ~2400 >source/destination >pairs, however due to the ability to shortcut the comparison the number of >tests >that need to be done should be <400) > > > > In addition, there seems to be some sort of locking betwen the multiple > worker >processes in 3.2 when checking the ACLs as the test with almost no ACLs scales >close to 100% per worker while with the ACLs it scales much more slowly, and >above 4-5 workers actually drops off dramatically (to the point where with 8 >workers the throughput is down to about what you get with 1-2 workers) I don't >see any conceptual reason why the ACL checks of the different worker threads >should impact each other in any way, let alone in a way that limits >scalability >to ~4 workers before adding more workers is a net loss. > > David Lang > > >> On Wed, 13 Apr 2011, Marcos wrote: >> >>> Hi David, >>> >>> could you run and publish your benchmark with squid 2.7 ??? >>> i'd like to know if is there any regression between 2.7 and 3.x series. >>> >>> thanks. >>> >>> Marcos >>> >>> >>> - Mensagem original >>> De: "da...@lang.hm" >>> Para: Amos Jeffries >>> Cc: squid-users@squid-cache.org; squid-...@squid-cache.org >>> Enviadas: S?bado, 9 de Abril de 2011 12:56:12 >>> Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues >>> >>> On Sat, 9 Apr 2011, Amos Jeffries wrote: >>> >>>> On 09/04/11 14:27, da...@lang.hm wrote: >>>>> A couple more things about the ACLs used in my test >>>>> >>>>> all of them are allow ACLs (no deny rules to worry about precidence of) >>>>> except for a deny-all at the bottom >>>>> >>>>> the ACL line that permits the test source to the test destination has >>>>> zero overlap with the rest of the rules >>>>> >>>>> every rule has an IP based restriction (even the ones with url_regex are >>>>> source -> URL regex) >>>>> >>>>> I moved the ACL that allows my test from the bottom of the ruleset to >>>>> the top and the resulting performance numbers were up as if the other >>>>> ACLs didn't exist. As such it is very clear that 3.2 is evaluating every >>>>> rule. >>>>> >>>>> I changed one of t
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
ping, I haven't seen a response to this additional information that I sent out last week. squid 3.1 and 3.2 are a significant regression in performance from squid 2.7 or 3.0 David Lang On Thu, 14 Apr 2011, da...@lang.hm wrote: Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues Ok, I finally got a chance to test 2.7STABLE9 it performs about the same as squid 3.0, possibly a little better. with my somewhat stripped down config (smaller regex patterns, replacing CIDR blocks and names that would need to be looked up in /etc/hosts with individual IP addresses) 2.7 gives ~4800 requests/sec 3.0 gives ~4600 requests/sec 3.2.0.6 with 1 worker gives ~1300 requests/sec 3.2.0.6 with 5 workers gives ~2800 requests/sec the numbers for 3.0 are slightly better than what I was getting with the full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the last round of tests (with either the full or simplified ruleset) so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the ability to use multiple worker processes in 3.2 doesn't make up for this. the time taken seems to almost all be in the ACL avaluation as eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. one theory is that even though I have IPv6 disabled on this build, the added space and more expensive checks needed to compare IPv6 addresses instead of IPv4 addresses accounts for the single worker drop of ~66%. that seems rather expensive, even though there are 293 http_access lines (and one of them uses external file contents in it's acls, so it's a total of ~2400 source/destination pairs, however due to the ability to shortcut the comparison the number of tests that need to be done should be <400) In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs as the test with almost no ACLs scales close to 100% per worker while with the ACLs it scales much more slowly, and above 4-5 workers actually drops off dramatically (to the point where with 8 workers the throughput is down to about what you get with 1-2 workers) I don't see any conceptual reason why the ACL checks of the different worker threads should impact each other in any way, let alone in a way that limits scalability to ~4 workers before adding more workers is a net loss. David Lang On Wed, 13 Apr 2011, Marcos wrote: Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
Ok, I finally got a chance to test 2.7STABLE9 it performs about the same as squid 3.0, possibly a little better. with my somewhat stripped down config (smaller regex patterns, replacing CIDR blocks and names that would need to be looked up in /etc/hosts with individual IP addresses) 2.7 gives ~4800 requests/sec 3.0 gives ~4600 requests/sec 3.2.0.6 with 1 worker gives ~1300 requests/sec 3.2.0.6 with 5 workers gives ~2800 requests/sec the numbers for 3.0 are slightly better than what I was getting with the full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the last round of tests (with either the full or simplified ruleset) so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the ability to use multiple worker processes in 3.2 doesn't make up for this. the time taken seems to almost all be in the ACL avaluation as eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. one theory is that even though I have IPv6 disabled on this build, the added space and more expensive checks needed to compare IPv6 addresses instead of IPv4 addresses accounts for the single worker drop of ~66%. that seems rather expensive, even though there are 293 http_access lines (and one of them uses external file contents in it's acls, so it's a total of ~2400 source/destination pairs, however due to the ability to shortcut the comparison the number of tests that need to be done should be <400) In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs as the test with almost no ACLs scales close to 100% per worker while with the ACLs it scales much more slowly, and above 4-5 workers actually drops off dramatically (to the point where with 8 workers the throughput is down to about what you get with 1-2 workers) I don't see any conceptual reason why the ACL checks of the different worker threads should impact each other in any way, let alone in a way that limits scalability to ~4 workers before adding more workers is a net loss. David Lang On Wed, 13 Apr 2011, Marcos wrote: Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If they did, the fact that one 3.2 process is about 1/3 the speed of a 3.0 process in checking the acls wouldn't matter nearly as much when it's so easy to get an 8+ core system. There you have the unknown. I think this is a fairly critical thing to figure out.
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
sorry, haven't had time to do that yet. I will try and get this done today. David Lang On Wed, 13 Apr 2011, Marcos wrote: Date: Wed, 13 Apr 2011 04:11:09 -0700 (PDT) From: Marcos To: da...@lang.hm, Amos Jeffries Cc: squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Res: [squid-users] squid 3.2.0.5 smp scaling issues Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If they did, the fact that one 3.2 process is about 1/3 the speed of a 3.0 process in checking the acls wouldn't matter nearly as much when it's so easy to get an 8+ core system. There you have the unknown. I think this is a fairly critical thing to figure out. it seems to me that all accept/deny rules in a set should be able to be combined into a tree to make searching them very fast. so for example if you have accept 1 accept 2 deny 3 deny 4 accept 5 you need to create three trees (one with accept 1 and accept 2, one with deny3 and deny4, and one with accept 5) and then check each tree to see if you have a match. the types of match could be done in order of increasing cost, so if you The config file is specific structure configured by admin under guaranteed rules of operation for access lines (top-down, left-to-right, first-match-wins) to perform boolean-logic calculations using ACL sets. Sorting access line rules is not an option. Sorting ACL values and tree-forming them is already done (regex being the one exception AFAIK). Sorting position-wise on a single access line is also ruled out by interactions with deny_info, auth and external ACL types. It would seem that as long as you don't cross boundries between the different types, you should be able to optimize within a group. using my example above, you couldn't combine the 'accept 5' with any of the other accepts, but you could combine accept 1 and 2 and combine deny 3 and 4 togeather. now, I know that I don't fully understand all the possible ACL types, so this may not work for some of them, but I believe that a fairly common use case is to have either a lot of allow rules, or a lot of deny rules as a block (either a list of sites you are allowed to access, or a list of sites that are blocked), so an ability to optimize these use cases may be well worth it. have acl entries of type port, src, dst, and url regex, organize the tree so that you check ports first, then src, then dst, then only if all that matches do you need to do the regex. This would be very similar to the sho
Res: [squid-users] squid 3.2.0.5 smp scaling issues
Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: Sábado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: > On 09/04/11 14:27, da...@lang.hm wrote: >> A couple more things about the ACLs used in my test >> >> all of them are allow ACLs (no deny rules to worry about precidence of) >> except for a deny-all at the bottom >> >> the ACL line that permits the test source to the test destination has >> zero overlap with the rest of the rules >> >> every rule has an IP based restriction (even the ones with url_regex are >> source -> URL regex) >> >> I moved the ACL that allows my test from the bottom of the ruleset to >> the top and the resulting performance numbers were up as if the other >> ACLs didn't exist. As such it is very clear that 3.2 is evaluating every >> rule. >> >> I changed one of the url_regex rules to just match one line rather than >> a file containing 307 lines to see if that made a difference, and it >> made no significant difference. So this indicates to me that it's not >> having to fully evaluate every rule (it's able to skip doing the regex >> if the IP match doesn't work) >> >> I then changed all the acl lines that used hostnames to have IP >> addresses in them, and this also made no significant difference >> >> I then changed all subnet matches to single IP address (just nuked /## >> throughout the config file) and this also made no significant difference. >> > > Squid has always worked this way. It will *test* every rule from the top down >to the one that matches. Also testing each line left-to-right until one fails >or >the whole line matches. > >> >> so why are the address matches so expensive >> > > 3.0 and older IP address is a 32-bit comparison. > 3.1 and newer IP address is a 128-bit comparison with memcmp(). > > If something like a word-wise comparison can be implemented faster than >memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? >> and as noted in the e-mail below, why do these checks not scale nicely >> with the number of worker processes? If they did, the fact that one 3.2 >> process is about 1/3 the speed of a 3.0 process in checking the acls >> wouldn't matter nearly as much when it's so easy to get an 8+ core system. >> > > There you have the unknown. I think this is a fairly critical thing to figure out. >> >> it seems to me that all accept/deny rules in a set should be able to be >> combined into a tree to make searching them very fast. >> >> so for example if you have >> >> accept 1 >> accept 2 >> deny 3 >> deny 4 >> accept 5 >> >> you need to create three trees (one with accept 1 and accept 2, one with >> deny3 and deny4, and one with accept 5) and then check each tree to see >> if you have a match. >> >> the types of match could be done in order of increasing cost, so if you > > The config file is specific structure configured by admin under guaranteed >rules of operation for access lines (top-down, left-to-right, >first-match-wins) >to perform boolean-logic calculations using ACL sets. > Sorting access line rules is not an option. > Sorting ACL values and tree-forming them is already done (regex being the one >exception AFAIK). > Sorting position-wise on a single access line is also ruled out by > interactions >with deny_info, auth and external ACL types. It would seem that as long as you don't cross boundries between the different types, you should be able to optimize within a group. using my example above, you couldn't combine the 'accept 5' with any of the other accepts, but you could combine accept 1 and 2 and combine deny 3 and 4 togeather. now, I know that I don't fully understand all the possible ACL types, so this may not work for some of them, but I believe that a fairly common use case is to have either a lot of allow rules, or a lot of deny rules as a block (either a list of sites you are allowed to access, or a list of sites that are blocked), so an ability to optimize these use cases may b
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If they did, the fact that one 3.2 process is about 1/3 the speed of a 3.0 process in checking the acls wouldn't matter nearly as much when it's so easy to get an 8+ core system. There you have the unknown. I think this is a fairly critical thing to figure out. it seems to me that all accept/deny rules in a set should be able to be combined into a tree to make searching them very fast. so for example if you have accept 1 accept 2 deny 3 deny 4 accept 5 you need to create three trees (one with accept 1 and accept 2, one with deny3 and deny4, and one with accept 5) and then check each tree to see if you have a match. the types of match could be done in order of increasing cost, so if you The config file is specific structure configured by admin under guaranteed rules of operation for access lines (top-down, left-to-right, first-match-wins) to perform boolean-logic calculations using ACL sets. Sorting access line rules is not an option. Sorting ACL values and tree-forming them is already done (regex being the one exception AFAIK). Sorting position-wise on a single access line is also ruled out by interactions with deny_info, auth and external ACL types. It would seem that as long as you don't cross boundries between the different types, you should be able to optimize within a group. using my example above, you couldn't combine the 'accept 5' with any of the other accepts, but you could combine accept 1 and 2 and combine deny 3 and 4 togeather. now, I know that I don't fully understand all the possible ACL types, so this may not work for some of them, but I believe that a fairly common use case is to have either a lot of allow rules, or a lot of deny rules as a block (either a list of sites you are allowed to access, or a list of sites that are blocked), so an ability to optimize these use cases may be well worth it. have acl entries of type port, src, dst, and url regex, organize the tree so that you check ports first, then src, then dst, then only if all that matches do you need to do the regex. This would be very similar to the shortcut logic that you use today with a single rule where you bail out when you don't find a match. you could go with a complex tree structure, but since this only needs to be changed at boot time, Um, "boot"/startup time and arbitrary "-k reconfigure" times. With a reverse-configuration display dump on any cache manager request. still a pretty rare case, and one where you can build a completely new ruleset and swap it out. My point was that this isn't something that you have to be able to update dynamically. it seems to me that a simple array that you can do a binary search on will work for the port, src, and dst trees. The url regex is probably easiest to initially create by just doing a list of regex strings to match and working down that list, but eventually it This is already how we do these. But
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If they did, the fact that one 3.2 process is about 1/3 the speed of a 3.0 process in checking the acls wouldn't matter nearly as much when it's so easy to get an 8+ core system. There you have the unknown. it seems to me that all accept/deny rules in a set should be able to be combined into a tree to make searching them very fast. so for example if you have accept 1 accept 2 deny 3 deny 4 accept 5 you need to create three trees (one with accept 1 and accept 2, one with deny3 and deny4, and one with accept 5) and then check each tree to see if you have a match. the types of match could be done in order of increasing cost, so if you The config file is specific structure configured by admin under guaranteed rules of operation for access lines (top-down, left-to-right, first-match-wins) to perform boolean-logic calculations using ACL sets. Sorting access line rules is not an option. Sorting ACL values and tree-forming them is already done (regex being the one exception AFAIK). Sorting position-wise on a single access line is also ruled out by interactions with deny_info, auth and external ACL types. have acl entries of type port, src, dst, and url regex, organize the tree so that you check ports first, then src, then dst, then only if all that matches do you need to do the regex. This would be very similar to the shortcut logic that you use today with a single rule where you bail out when you don't find a match. you could go with a complex tree structure, but since this only needs to be changed at boot time, Um, "boot"/startup time and arbitrary "-k reconfigure" times. With a reverse-configuration display dump on any cache manager request. it seems to me that a simple array that you can do a binary search on will work for the port, src, and dst trees. The url regex is probably easiest to initially create by just doing a list of regex strings to match and working down that list, but eventually it This is already how we do these. But with a splay tree instead of binary. may be best to create a parse tree so that you only have to walk down the string once to see if you have a match. That would be nice. Care to implement? You just have to get the regex library to adjust its pre-compiled patterns with OR into (existing|new) whenever a new pattern string is added to an ACL. you wouldn't quite be able to get this fast as you would have to actually do two checks, one if you have a match on that level and one for the rules that don't specify something in the current tree (one check for if the http_access line specifies a port number and one for if it doesn't for example) We get around this problem by using C++ types. ACLChecklist walks the tree holding the current location, expected result, and all details available about the transaction. Each node in the tree has a match() function which gets called at most once per walk. Each ACL data type provides its own match() algorithm. That is why the following config is invalid: acl foo src 1.2.3.4 acl foo port 80 this sort of acl structure would reduce a complex ruleset down to ~O(log n) instead of the current O(n) (a really complex ruleset would be log n of each tree ad
Re: [squid-users] squid 3.2.0.5 smp scaling issues
workers gets 11,300 requests/sec 3.2.0.6 with 4 workers gets 15,600 requests/sec 3.2.0.6 with 5 workers gets 15,800 requests/sec 3.2.0.6 with 6 workers gets 16,400 requests/sec David Lang On Fri, 8 Apr 2011, Amos Jeffries wrote: Date: Fri, 08 Apr 2011 15:37:24 +1200 From: Amos Jeffries To: squid-users@squid-cache.org Subject: Re: [squid-users] squid 3.2.0.5 smp scaling issues On 08/04/11 14:32, da...@lang.hm wrote: sorry for the delay. I got a chance to do some more testing (slightly different environment on the apache server, so these numbers are a little lower for the same versions than the last ones I posted) results when requesting short html page squid 3.0.STABLE12 4000 requests/sec squid 3.1.11 1500 requests/sec squid 3.1.12 1530 requests/sec squid 3.2.0.5 1 worker 1300 requests/sec squid 3.2.0.5 2 workers 2050 requests/sec squid 3.2.0.5 3 workers 2700 requests/sec squid 3.2.0.5 4 workers 2950 requests/sec squid 3.2.0.5 5 workers 2900 requests/sec squid 3.2.0.5 6 workers 2530 requests/sec squid 3.2.0.6 1 worker 1400 requests/sec squid 3.2.0.6 2 workers 2050 requests/sec squid 3.2.0.6 3 workers 2730 requests/sec squid 3.2.0.6 4 workers 2950 requests/sec squid 3.2.0.6 5 workers 2830 requests/sec squid 3.2.0.6 6 workers 2530 requests/sec squid 3.2.0.6 7 workers 2160 requests/sec instead of all processes being at 100% several were at 99% squid 3.2.0.6 8 workers 1950 requests/sec instead of all processes being at 100% some were as low as 92% so the new versions are really about the same moving to large requests cut these numbers by about 1/3, but the squid processes were not maxing out the CPU one issue I saw, I had to reduce the number of concurrent connections or I would have requests time out (3.2 vs earlier versions), on 3.2 I had to have -c on ab at ~100-150 where I could go significantly higher on 3.1 and 3.0 David Lang Thank you. So with small files 2% on 3.1 and ~7% on 3.2 with a single worker. But under 1% on multiple 3.2 workers. And overloading/flooding the I/O bandwidth on large files. NP: when overloading I/O one cannot compare to runs with different sizes. Only with runs of the same traffic. Also only the CPU max load is reliable there, since requests/sec bottlenecks behind the I/O. So... your measure that CPU dropped is a good sign for large files. Amos
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On 08/04/11 14:32, da...@lang.hm wrote: sorry for the delay. I got a chance to do some more testing (slightly different environment on the apache server, so these numbers are a little lower for the same versions than the last ones I posted) results when requesting short html page squid 3.0.STABLE12 4000 requests/sec squid 3.1.11 1500 requests/sec squid 3.1.12 1530 requests/sec squid 3.2.0.5 1 worker 1300 requests/sec squid 3.2.0.5 2 workers 2050 requests/sec squid 3.2.0.5 3 workers 2700 requests/sec squid 3.2.0.5 4 workers 2950 requests/sec squid 3.2.0.5 5 workers 2900 requests/sec squid 3.2.0.5 6 workers 2530 requests/sec squid 3.2.0.6 1 worker 1400 requests/sec squid 3.2.0.6 2 workers 2050 requests/sec squid 3.2.0.6 3 workers 2730 requests/sec squid 3.2.0.6 4 workers 2950 requests/sec squid 3.2.0.6 5 workers 2830 requests/sec squid 3.2.0.6 6 workers 2530 requests/sec squid 3.2.0.6 7 workers 2160 requests/sec instead of all processes being at 100% several were at 99% squid 3.2.0.6 8 workers 1950 requests/sec instead of all processes being at 100% some were as low as 92% so the new versions are really about the same moving to large requests cut these numbers by about 1/3, but the squid processes were not maxing out the CPU one issue I saw, I had to reduce the number of concurrent connections or I would have requests time out (3.2 vs earlier versions), on 3.2 I had to have -c on ab at ~100-150 where I could go significantly higher on 3.1 and 3.0 David Lang Thank you. So with small files 2% on 3.1 and ~7% on 3.2 with a single worker. But under 1% on multiple 3.2 workers. And overloading/flooding the I/O bandwidth on large files. NP: when overloading I/O one cannot compare to runs with different sizes. Only with runs of the same traffic. Also only the CPU max load is reliable there, since requests/sec bottlenecks behind the I/O. So... your measure that CPU dropped is a good sign for large files. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.12 Beta testers wanted for 3.2.0.6
Re: [squid-users] squid 3.2.0.5 smp scaling issues
sorry for the delay. I got a chance to do some more testing (slightly different environment on the apache server, so these numbers are a little lower for the same versions than the last ones I posted) results when requesting short html page squid 3.0.STABLE12 4000 requests/sec squid 3.1.11 1500 requests/sec squid 3.1.12 1530 requests/sec squid 3.2.0.5 1 worker 1300 requests/sec squid 3.2.0.5 2 workers 2050 requests/sec squid 3.2.0.5 3 workers 2700 requests/sec squid 3.2.0.5 4 workers 2950 requests/sec squid 3.2.0.5 5 workers 2900 requests/sec squid 3.2.0.5 6 workers 2530 requests/sec squid 3.2.0.6 1 worker 1400 requests/sec squid 3.2.0.6 2 workers 2050 requests/sec squid 3.2.0.6 3 workers 2730 requests/sec squid 3.2.0.6 4 workers 2950 requests/sec squid 3.2.0.6 5 workers 2830 requests/sec squid 3.2.0.6 6 workers 2530 requests/sec squid 3.2.0.6 7 workers 2160 requests/sec instead of all processes being at 100% several were at 99% squid 3.2.0.6 8 workers 1950 requests/sec instead of all processes being at 100% some were as low as 92% so the new versions are really about the same moving to large requests cut these numbers by about 1/3, but the squid processes were not maxing out the CPU one issue I saw, I had to reduce the number of concurrent connections or I would have requests time out (3.2 vs earlier versions), on 3.2 I had to have -c on ab at ~100-150 where I could go significantly higher on 3.1 and 3.0 David Lang On Mon, 4 Apr 2011, da...@lang.hm wrote: On Mon, 4 Apr 2011, Amos Jeffries wrote: On 03/04/11 12:52, da...@lang.hm wrote: still no response from anyone. Is there any interest in investigating this issue? or should I just write off squid for future use due to it's performance degrading? It is a very ambiguous issue.. * We have your report with some nice rate benchmarks indicating regression * We have two others saying me-too with less details * We have an independent report indicating that 3.1 is faster than 2.7. With benchmarks to prove it. * We have several independent reports indicating that 3.2 is faster than 3.1. One like yours with benchmark proof. * We have someone responding to your report saying the CPU type affects things in a large way (likely due to SMP using CPU-level features) * We have our own internal testing which shows also a mix of results with the variance being dependent on which component of Squid is tested. Your test in particular is testing both the large object pass-thru (proxy only) capacity and the parser CPU ceiling. Could you try your test on 3.2.0.6 and 3.1.12 please? They both now have a server-facing buffer change which should directly affect your test results in a good way. thanks for the response, part of my frustration was just not hearing anything back. I'll do the tests on the new version shortly (hopefully on monday) if there are other tests that people would like me to perform on the hardware I have available, please let me know. right now I am just testing proxy/ACL with no caching, but I am testing four traffic types 1. small static files 2. large static files 3. small dynamic files (returning the exact same data as 1, but only after a fixed delay) 4. large dynamic files. while I see a dramatic difference in the performance on the different tests, so far the ratios between the different versions have been consistant across all four scenerios. David Lang
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 4 Apr 2011, Amos Jeffries wrote: On 03/04/11 12:52, da...@lang.hm wrote: still no response from anyone. Is there any interest in investigating this issue? or should I just write off squid for future use due to it's performance degrading? It is a very ambiguous issue.. * We have your report with some nice rate benchmarks indicating regression * We have two others saying me-too with less details * We have an independent report indicating that 3.1 is faster than 2.7. With benchmarks to prove it. * We have several independent reports indicating that 3.2 is faster than 3.1. One like yours with benchmark proof. * We have someone responding to your report saying the CPU type affects things in a large way (likely due to SMP using CPU-level features) * We have our own internal testing which shows also a mix of results with the variance being dependent on which component of Squid is tested. Your test in particular is testing both the large object pass-thru (proxy only) capacity and the parser CPU ceiling. Could you try your test on 3.2.0.6 and 3.1.12 please? They both now have a server-facing buffer change which should directly affect your test results in a good way. thanks for the response, part of my frustration was just not hearing anything back. I'll do the tests on the new version shortly (hopefully on monday) if there are other tests that people would like me to perform on the hardware I have available, please let me know. right now I am just testing proxy/ACL with no caching, but I am testing four traffic types 1. small static files 2. large static files 3. small dynamic files (returning the exact same data as 1, but only after a fixed delay) 4. large dynamic files. while I see a dramatic difference in the performance on the different tests, so far the ratios between the different versions have been consistant across all four scenerios. David Lang
Re: [squid-users] squid 3.2.0.5 smp scaling issues
On 03/04/11 12:52, da...@lang.hm wrote: still no response from anyone. Is there any interest in investigating this issue? or should I just write off squid for future use due to it's performance degrading? It is a very ambiguous issue.. * We have your report with some nice rate benchmarks indicating regression * We have two others saying me-too with less details * We have an independent report indicating that 3.1 is faster than 2.7. With benchmarks to prove it. * We have several independent reports indicating that 3.2 is faster than 3.1. One like yours with benchmark proof. * We have someone responding to your report saying the CPU type affects things in a large way (likely due to SMP using CPU-level features) * We have our own internal testing which shows also a mix of results with the variance being dependent on which component of Squid is tested. Your test in particular is testing both the large object pass-thru (proxy only) capacity and the parser CPU ceiling. Could you try your test on 3.2.0.6 and 3.1.12 please? They both now have a server-facing buffer change which should directly affect your test results in a good way. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.12 Beta testers wanted for 3.2.0.6
Re: [squid-users] squid 3.2.0.5 smp scaling issues
still no response from anyone. Is there any interest in investigating this issue? or should I just write off squid for future use due to it's performance degrading? David Lang On Sat, 26 Mar 2011, da...@lang.hm wrote: Subject: Re: [squid-users] squid 3.2.0.5 smp scaling issues re-sending and adding -dev list performance drops going from 3.0 -> 3.1 -> 3.2 and in addition squid 3.2 scales poorly (only goes up to 2x single-threaded performance going up to 4 cores and drops off again after that) this makes it so that I actually get better performance on 3.0 than on 3.2, even with multiple workers David Lang On Mon, 21 Mar 2011, da...@lang.hm wrote: Date: Mon, 21 Mar 2011 19:26:38 -0700 (PDT) From: da...@lang.hm To: squid-users@squid-cache.org Subject: [squid-users] squid 3.2.0.5 smp scaling issues test setup box A running apache and ab test against local IP address >13000 requests/sec box B running squid, 8 2.3 GHz Opteron cores with 16G ram non acl/cache-peer related lines in the config are (including typos from me manually entering this) http_port 8000 icp_port 0 visible_hostname gromit1 cache_effective_user proxy cache_effective_group proxy appaend_domain .invalid.server.name pid_filename /var/run/squid.pid cache_dir null /tmp client_db off cache_access_log syslog squid cache_log /var/log/squid/cache.log cache_store_log none coredump_dir none no_cache deny all results when requesting short html page squid 3.0.STABLE12 4200 requests/sec squid 3.1.11 2100 requests/sec squid 3.2.0.5 1 worker 1400 requests/sec squid 3.2.0.5 2 workers 2100 requests/sec squid 3.2.0.5 3 workers 2500 requests/sec squid 3.2.0.5 4 workers 2900 requests/sec squid 3.2.0.5 5 workers 2900 requests/sec squid 3.2.0.5 6 workers 2500 requests/sec squid 3.2.0.5 7 workers 2000 requests/sec squid 3.2.0.5 8 workers 1900 requests/sec in all these tests the squid process was using 100% of the cpu I tried it pulling a large file (100K instead of <50 bytes) on the thought that this may be bottlenecking on accepting the connections but with something that took more time to service the connections it could do better however what I found is that with 8 workers all 8 were using <50% of the CPU at 1000 requests/sec local machine would do 7000 requests/sec to itself 1 worker 500 requests/sec 2 workers 957 requests/sec from there it remained about 1000 requests/sec with the cpu utilization slowly dropping off (but not dropping as fast as it should with the number of cores available) so it looks like there is some significant bottleneck in version 3.2 that makes the SMP support fairly ineffective. in reading the wiki page at wili.squid-cache.org/Features/SmpScale I see you worrying about fairness between workers. If you have put in code to try and ensure fairness, you may want to remove it and see what happens to performance. what you are describing on that page in terms of fairness is what I would expect form a 'first-come-first-served' approach to multiple processes grabbing new connections. The worker that last ran is hot in the cache and so has an 'unfair' advantage in noticing and processing the new request, but as that worker gets busier, it will be spending more time servicing the request and the other processes will get more of a chance to grab the new connection, so it will appear unfair under light load, but become more fair under heavy load. David Lang
Re: [squid-users] squid 3.2.0.5 smp scaling issues
re-sending and adding -dev list performance drops going from 3.0 -> 3.1 -> 3.2 and in addition squid 3.2 scales poorly (only goes up to 2x single-threaded performance going up to 4 cores and drops off again after that) this makes it so that I actually get better performance on 3.0 than on 3.2, even with multiple workers David Lang On Mon, 21 Mar 2011, da...@lang.hm wrote: Date: Mon, 21 Mar 2011 19:26:38 -0700 (PDT) From: da...@lang.hm To: squid-users@squid-cache.org Subject: [squid-users] squid 3.2.0.5 smp scaling issues test setup box A running apache and ab test against local IP address >13000 requests/sec box B running squid, 8 2.3 GHz Opteron cores with 16G ram non acl/cache-peer related lines in the config are (including typos from me manually entering this) http_port 8000 icp_port 0 visible_hostname gromit1 cache_effective_user proxy cache_effective_group proxy appaend_domain .invalid.server.name pid_filename /var/run/squid.pid cache_dir null /tmp client_db off cache_access_log syslog squid cache_log /var/log/squid/cache.log cache_store_log none coredump_dir none no_cache deny all results when requesting short html page squid 3.0.STABLE12 4200 requests/sec squid 3.1.11 2100 requests/sec squid 3.2.0.5 1 worker 1400 requests/sec squid 3.2.0.5 2 workers 2100 requests/sec squid 3.2.0.5 3 workers 2500 requests/sec squid 3.2.0.5 4 workers 2900 requests/sec squid 3.2.0.5 5 workers 2900 requests/sec squid 3.2.0.5 6 workers 2500 requests/sec squid 3.2.0.5 7 workers 2000 requests/sec squid 3.2.0.5 8 workers 1900 requests/sec in all these tests the squid process was using 100% of the cpu I tried it pulling a large file (100K instead of <50 bytes) on the thought that this may be bottlenecking on accepting the connections but with something that took more time to service the connections it could do better however what I found is that with 8 workers all 8 were using <50% of the CPU at 1000 requests/sec local machine would do 7000 requests/sec to itself 1 worker 500 requests/sec 2 workers 957 requests/sec from there it remained about 1000 requests/sec with the cpu utilization slowly dropping off (but not dropping as fast as it should with the number of cores available) so it looks like there is some significant bottleneck in version 3.2 that makes the SMP support fairly ineffective. in reading the wiki page at wili.squid-cache.org/Features/SmpScale I see you worrying about fairness between workers. If you have put in code to try and ensure fairness, you may want to remove it and see what happens to performance. what you are describing on that page in terms of fairness is what I would expect form a 'first-come-first-served' approach to multiple processes grabbing new connections. The worker that last ran is hot in the cache and so has an 'unfair' advantage in noticing and processing the new request, but as that worker gets busier, it will be spending more time servicing the request and the other processes will get more of a chance to grab the new connection, so it will appear unfair under light load, but become more fair under heavy load. David Lang
[squid-users] squid 3.2.0.5 smp scaling issues
test setup box A running apache and ab test against local IP address >13000 requests/sec box B running squid, 8 2.3 GHz Opteron cores with 16G ram non acl/cache-peer related lines in the config are (including typos from me manually entering this) http_port 8000 icp_port 0 visible_hostname gromit1 cache_effective_user proxy cache_effective_group proxy appaend_domain .invalid.server.name pid_filename /var/run/squid.pid cache_dir null /tmp client_db off cache_access_log syslog squid cache_log /var/log/squid/cache.log cache_store_log none coredump_dir none no_cache deny all results when requesting short html page squid 3.0.STABLE12 4200 requests/sec squid 3.1.11 2100 requests/sec squid 3.2.0.5 1 worker 1400 requests/sec squid 3.2.0.5 2 workers 2100 requests/sec squid 3.2.0.5 3 workers 2500 requests/sec squid 3.2.0.5 4 workers 2900 requests/sec squid 3.2.0.5 5 workers 2900 requests/sec squid 3.2.0.5 6 workers 2500 requests/sec squid 3.2.0.5 7 workers 2000 requests/sec squid 3.2.0.5 8 workers 1900 requests/sec in all these tests the squid process was using 100% of the cpu I tried it pulling a large file (100K instead of <50 bytes) on the thought that this may be bottlenecking on accepting the connections but with something that took more time to service the connections it could do better however what I found is that with 8 workers all 8 were using <50% of the CPU at 1000 requests/sec local machine would do 7000 requests/sec to itself 1 worker 500 requests/sec 2 workers 957 requests/sec from there it remained about 1000 requests/sec with the cpu utilization slowly dropping off (but not dropping as fast as it should with the number of cores available) so it looks like there is some significant bottleneck in version 3.2 that makes the SMP support fairly ineffective. in reading the wiki page at wili.squid-cache.org/Features/SmpScale I see you worrying about fairness between workers. If you have put in code to try and ensure fairness, you may want to remove it and see what happens to performance. what you are describing on that page in terms of fairness is what I would expect form a 'first-come-first-served' approach to multiple processes grabbing new connections. The worker that last ran is hot in the cache and so has an 'unfair' advantage in noticing and processing the new request, but as that worker gets busier, it will be spending more time servicing the request and the other processes will get more of a chance to grab the new connection, so it will appear unfair under light load, but become more fair under heavy load. David Lang