Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 12/03/2013 8:11 a.m., paulm wrote: Excusme David What are the ab paramenters that use to test agains squid ? -n for request count -c for concurrency level SMP in Squid shares a listening port so -c 1 will still test both workers. But the results are more interesting as you vary client count versus request count. For worst-case traffic scenario test with a guaranteed MISS response, for best-case test with a small HIT response. Other than that whatever you like. Using a FQDN you host yourself is polite. Amos
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
Excusme David What are the ab paramenters that use to test agains squid ? thnks, Paul -- View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/squid-3-2-0-5-smp-scaling-issues-tp3395333p4658947.html Sent from the Squid - Users mailing list archive at Nabble.com.
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Wed, 4 May 2011 16:36:08 -0700 (PDT), da...@lang.hm wrote: On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 12:49 PM, da...@lang.hm wrote: IMHO, you can maximize your chances of getting free help by isolating the problem better. For example, perhaps you can try to reproduce it with different kinds of fast ACLs (the simpler the better!). This will help clarify whether the problem is specific to IPv6, IP, or ACLs in general. Test different number of ACLs: Does the problem happen only when there number of simple ACLs is huge? Make the problem easier to reproduce by posting configuration files (including Polygraph workloads or options for some other benchmarking tool you use). - This is not a guarantee that somebody will jump and help you, but fixing a well-triaged issue is often much easier. that's why I'm speaking up. I just have not known what to test. are there other types of ACLs that I should be testing? We can't answer that without having seen your config file and which are in use now. The list of all available ACL are at http://wiki.squid-cache.org/SquidFaq/SquidAcl and http://www.squid-cache.org/Doc/config/acl/ I'll setup some tests with differnet numbers of ACLs. since I've already verified that the number of ACLs defined isn't the significant factor, only the number tested before one succeds (by moving the ACL that allows my access from the end of the file to the beginning of the file, keeping everything else the same), I'll see if the slowdown seems proportional to the number of rules, or if there is something else going on. any other types of testing I should do? The above looks like a good benchmark *provided* all the ACLs have the same type with consistent content counts. Mixing types makes the result non-comparable with other tests. If you have time (and want to), we kind of need that type of benchmarking done for each ACL type. Prioritising by popularity: src/dst by IP, port, domain and regex variants. Then proxy_auth, external (the "fake" helpers can help here). Then the others; ie browser, proto, method, header matching. We know general fuzzy details like, for example, a port test is faster than a domain test. One with details presented up front by the client is also faster than one where a lookup is needed. But have no deeper info to say if a dstdomain test is faster or slower than a src (IP) test. Way down my TODO list is the dream of micro-benchmarking the ACLs in their unit-tests. Amos
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 12:49 PM, da...@lang.hm wrote: I don't know how many developers are working on squid, so I don't knwo if you are the only person who can do this sort of work or not. I am sure there are others who can do this. The question is whether you can quickly find somebody interested enough to spend their time on your problem. In general, folks work on issues that are important to them or to their customers. Most active developers donate a lot of free time, but it still tends to revolve around issues they care about for one reason or another. We all have to prioritize. I do understand this. do you think that I should join the squid-dev list? I believe your messages are posted to squid-dev so you are not going to reach a wider audience if you do. If you want to write Squid code, joining is a good idea! I don't really have the time to do coding on this project IMHO, you can maximize your chances of getting free help by isolating the problem better. For example, perhaps you can try to reproduce it with different kinds of fast ACLs (the simpler the better!). This will help clarify whether the problem is specific to IPv6, IP, or ACLs in general. Test different number of ACLs: Does the problem happen only when there number of simple ACLs is huge? Make the problem easier to reproduce by posting configuration files (including Polygraph workloads or options for some other benchmarking tool you use). This is not a guarantee that somebody will jump and help you, but fixing a well-triaged issue is often much easier. that's why I'm speaking up. I just have not known what to test. are there other types of ACLs that I should be testing? I'll setup some tests with differnet numbers of ACLs. since I've already verified that the number of ACLs defined isn't the significant factor, only the number tested before one succeds (by moving the ACL that allows my access from the end of the file to the beginning of the file, keeping everything else the same), I'll see if the slowdown seems proportional to the number of rules, or if there is something else going on. any other types of testing I should do? David Lang HTH, Alex. On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 11:41 AM, da...@lang.hm wrote: anything new on this issue? (including any patches for me to test?) If you mean the "ACLs do not scale well" issue, then I do not have any free cycles to work on it right now. I was happy to clarify the new SMP architecture and suggest ways to triage the issue further. Let's hope somebody else can volunteer to do the required legwork. Alex. On Mon, 25 Apr 2011, da...@lang.hm wrote: Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) From: da...@lang.hm To: Alex Rousskov Cc: Marcos , squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is th
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Wed, 4 May 2011 11:49:01 -0700 (PDT), da...@lang.hm wrote: I don't know how many developers are working on squid, so I don't knwo if you are the only person who can do this sort of work or not. 4 part-timers and a few others focused on specific areas. do you think that I should join the squid-dev list? I thought you had, if you are intending to follow this for long it could be a good idea anyway. If you have any time to spare on tinkering with optimizations even better. Amos
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 05/04/2011 12:49 PM, da...@lang.hm wrote: > I don't know how many developers are working on squid, so I don't knwo > if you are the only person who can do this sort of work or not. I am sure there are others who can do this. The question is whether you can quickly find somebody interested enough to spend their time on your problem. In general, folks work on issues that are important to them or to their customers. Most active developers donate a lot of free time, but it still tends to revolve around issues they care about for one reason or another. We all have to prioritize. > do you think that I should join the squid-dev list? I believe your messages are posted to squid-dev so you are not going to reach a wider audience if you do. If you want to write Squid code, joining is a good idea! IMHO, you can maximize your chances of getting free help by isolating the problem better. For example, perhaps you can try to reproduce it with different kinds of fast ACLs (the simpler the better!). This will help clarify whether the problem is specific to IPv6, IP, or ACLs in general. Test different number of ACLs: Does the problem happen only when there number of simple ACLs is huge? Make the problem easier to reproduce by posting configuration files (including Polygraph workloads or options for some other benchmarking tool you use). This is not a guarantee that somebody will jump and help you, but fixing a well-triaged issue is often much easier. HTH, Alex. > On Wed, 4 May 2011, Alex Rousskov wrote: > >> On 05/04/2011 11:41 AM, da...@lang.hm wrote: >> >>> anything new on this issue? (including any patches for me to test?) >> >> If you mean the "ACLs do not scale well" issue, then I do not have any >> free cycles to work on it right now. I was happy to clarify the new SMP >> architecture and suggest ways to triage the issue further. Let's hope >> somebody else can volunteer to do the required legwork. >> >> Alex. >> >> >>> On Mon, 25 Apr 2011, da...@lang.hm wrote: >>> >>>> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) >>>> From: da...@lang.hm >>>> To: Alex Rousskov >>>> Cc: Marcos , squid-users@squid-cache.org, >>>> squid-...@squid-cache.org >>>> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues >>>> >>>> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>>> >>>>> On 04/25/2011 05:31 PM, da...@lang.hm wrote: >>>>>> On Mon, 25 Apr 2011, da...@lang.hm wrote: >>>>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>>>>>>> On 04/14/2011 09:06 PM, da...@lang.hm wrote: >>>>>>>> >>>>>>>>> In addition, there seems to be some sort of locking betwen the >>>>>>>>> multiple >>>>>>>>> worker processes in 3.2 when checking the ACLs >>>>>>>> >>>>>>>> There are pretty much no locks in the current official SMP code. >>>>>>>> This >>>>>>>> will change as we start adding shared caches in a week or so, but >>>>>>>> even >>>>>>>> then the ACLs will remain lock-free. There could be some internal >>>>>>>> locking in the 3rd-party libraries used by ACLs (regex and such), >>>>>>>> but I >>>>>>>> do not know much about them. >>>>>>> >>>>>>> what are the 3rd party libraries that I would be using? >>>>> >>>>> See "ldd squid". Here is a sample based on a randomly picked Squid: >>>>> >>>>>libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol >>>>> >>>>> Please note that I am not saying that any of these have problems in >>>>> SMP >>>>> environment. I am only saying that Squid itself does not lock anything >>>>> runtime so if our suspect is SMP-related locks, they would have to >>>>> reside elsewhere. The other possibility is that we should suspect >>>>> something else, of course. IMHO, it is more likely to be something >>>>> else: >>>>> after all, Squid does not use threads, where such problems are >>>>> expected. >>>> >>>> >>>>> BTW, do you see more-or-less even load across CPU cores? If not, >>>>> you may >>>>> need a patch that we find useful on older Linux kernels. It is >>>>> discussed >>>>> in the "Will similar
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
I don't know how many developers are working on squid, so I don't knwo if you are the only person who can do this sort of work or not. do you think that I should join the squid-dev list? David Lang On Wed, 4 May 2011, Alex Rousskov wrote: On 05/04/2011 11:41 AM, da...@lang.hm wrote: anything new on this issue? (including any patches for me to test?) If you mean the "ACLs do not scale well" issue, then I do not have any free cycles to work on it right now. I was happy to clarify the new SMP architecture and suggest ways to triage the issue further. Let's hope somebody else can volunteer to do the required legwork. Alex. On Mon, 25 Apr 2011, da...@lang.hm wrote: Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) From: da...@lang.hm To: Alex Rousskov Cc: Marcos , squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. they don't, everything in my test environment is by IP. But I've seen other software that still runs everything through name lookups, even if what's presented to the software (both in what's requested and in the ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if it's a name it gets resolved, if it's an IP, the IP comes back as-is, and it works for IPv4 and IPv6, no need to have logic that looks at the value and tries to figure out if the user intended to type a name or an IP). I don't know how squid is working internally (it's a pretty large codebase, and I haven't tried to really dive into it) so I don't know if squid does this or not. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process, so I was thinking that it was possible that if each ACL IP address is being put through a single DNS process, I could be running into contention on that process (and having to do name lookups for both IPv6 and then falling back to IPv4 would explain the severe performance hit far more than the difference between IPs being 128 bit values instead of 32 bit values) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 05/04/2011 11:41 AM, da...@lang.hm wrote: > anything new on this issue? (including any patches for me to test?) If you mean the "ACLs do not scale well" issue, then I do not have any free cycles to work on it right now. I was happy to clarify the new SMP architecture and suggest ways to triage the issue further. Let's hope somebody else can volunteer to do the required legwork. Alex. > On Mon, 25 Apr 2011, da...@lang.hm wrote: > >> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) >> From: da...@lang.hm >> To: Alex Rousskov >> Cc: Marcos , squid-users@squid-cache.org, >> squid-...@squid-cache.org >> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues >> >> On Mon, 25 Apr 2011, Alex Rousskov wrote: >> >>> On 04/25/2011 05:31 PM, da...@lang.hm wrote: >>>> On Mon, 25 Apr 2011, da...@lang.hm wrote: >>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>>>>> On 04/14/2011 09:06 PM, da...@lang.hm wrote: >>>>>> >>>>>>> In addition, there seems to be some sort of locking betwen the >>>>>>> multiple >>>>>>> worker processes in 3.2 when checking the ACLs >>>>>> >>>>>> There are pretty much no locks in the current official SMP code. This >>>>>> will change as we start adding shared caches in a week or so, but >>>>>> even >>>>>> then the ACLs will remain lock-free. There could be some internal >>>>>> locking in the 3rd-party libraries used by ACLs (regex and such), >>>>>> but I >>>>>> do not know much about them. >>>>> >>>>> what are the 3rd party libraries that I would be using? >>> >>> See "ldd squid". Here is a sample based on a randomly picked Squid: >>> >>>libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol >>> >>> Please note that I am not saying that any of these have problems in SMP >>> environment. I am only saying that Squid itself does not lock anything >>> runtime so if our suspect is SMP-related locks, they would have to >>> reside elsewhere. The other possibility is that we should suspect >>> something else, of course. IMHO, it is more likely to be something else: >>> after all, Squid does not use threads, where such problems are expected. >> >> >>> BTW, do you see more-or-less even load across CPU cores? If not, you may >>> need a patch that we find useful on older Linux kernels. It is discussed >>> in the "Will similar workers receive similar amount of work?" section of >>> http://wiki.squid-cache.org/Features/SmpScale >> >> the load is pretty even across all workers. >> >> with the problems descripted on that page, I would expect uneven >> utilization at low loads, but at high loads (with the workers busy >> serviceing requests rather than waiting for new connections), I would >> expect the work to even out (and the types of hacks described in that >> section to end up costing performance, but not in a way that would >> scale with the ACL processing load) >> >>>> one thought I had is that this could be locking on name lookups. how >>>> hard would it be to create a quick patch that would bypass the name >>>> lookups entirely and only do the lookups by IP. >>> >>> I did not realize your ACLs use DNS lookups. Squid internal DNS code >>> does not have any runtime SMP locks. However, the presence of DNS >>> lookups increases the number of suspects. >> >> they don't, everything in my test environment is by IP. But I've seen >> other software that still runs everything through name lookups, even >> if what's presented to the software (both in what's requested and in >> the ACLs) is all done by IPs. It's a easy way to bullet-proof the >> input (if it's a name it gets resolved, if it's an IP, the IP comes >> back as-is, and it works for IPv4 and IPv6, no need to have logic that >> looks at the value and tries to figure out if the user intended to >> type a name or an IP). I don't know how squid is working internally >> (it's a pretty large codebase, and I haven't tried to really dive into >> it) so I don't know if squid does this or not. >> >>> A patch you propose does not sound difficult to me, but since I cannot >>> contribute such a patch soon, it is probably better to test with ACLs >>> that do not require any DNS lookups instead. >>> &
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
ping, anything new on this issue? (including any patches for me to test?) David Lang On Mon, 25 Apr 2011, da...@lang.hm wrote: Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT) From: da...@lang.hm To: Alex Rousskov Cc: Marcos , squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. they don't, everything in my test environment is by IP. But I've seen other software that still runs everything through name lookups, even if what's presented to the software (both in what's requested and in the ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if it's a name it gets resolved, if it's an IP, the IP comes back as-is, and it works for IPv4 and IPv6, no need to have logic that looks at the value and tries to figure out if the user intended to type a name or an IP). I don't know how squid is working internally (it's a pretty large codebase, and I haven't tried to really dive into it) so I don't know if squid does this or not. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process, so I was thinking that it was possible that if each ACL IP address is being put through a single DNS process, I could be running into contention on that process (and having to do name lookups for both IPv6 and then falling back to IPv4 would explain the severe performance hit far more than the difference between IPs being 128 bit values instead of 32 bit values) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 06:14 PM, da...@lang.hm wrote: if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process I would like to fix that documentation, but I cannot find what phrase led you to the above conclusion. The SmpScale wiki page says: Currently, Squid workers do not share and do not synchronize other resources or services, including: * DNS caches (ipcache and fqdncache); So that seems to be correct and clear. Which documentation are you referring to? ahh, I missed that, I was going by the description of the config options that configure and disable the DNS cache (they don't say anything about the SMP mode, but I read them to imply that the squid-internal DNS cache was a separate thread/proccess) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 04/25/2011 06:14 PM, da...@lang.hm wrote: >>> if that regains the speed and/or scalability it would point fingers >>> fairly conclusively at the DNS components. >>> >>> this is the only think that I can think of that should be shared between >>> multiple workers processing ACLs >> >> but it is _not_ currently shared from Squid point of view. > > Ok, I was assuming from the description of things that there would be > one DNS process that all the workers would be accessing. from the way > it's described in the documentation it sounds as if it's already a > separate process I would like to fix that documentation, but I cannot find what phrase led you to the above conclusion. The SmpScale wiki page says: > Currently, Squid workers do not share and do not synchronize other > resources or services, including: > > * DNS caches (ipcache and fqdncache); So that seems to be correct and clear. Which documentation are you referring to? Thank you, Alex.
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/25/2011 05:31 PM, da...@lang.hm wrote: On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale the load is pretty even across all workers. with the problems descripted on that page, I would expect uneven utilization at low loads, but at high loads (with the workers busy serviceing requests rather than waiting for new connections), I would expect the work to even out (and the types of hacks described in that section to end up costing performance, but not in a way that would scale with the ACL processing load) one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. they don't, everything in my test environment is by IP. But I've seen other software that still runs everything through name lookups, even if what's presented to the software (both in what's requested and in the ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if it's a name it gets resolved, if it's an IP, the IP comes back as-is, and it works for IPv4 and IPv6, no need to have logic that looks at the value and tries to figure out if the user intended to type a name or an IP). I don't know how squid is working internally (it's a pretty large codebase, and I haven't tried to really dive into it) so I don't know if squid does this or not. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Ok, I was assuming from the description of things that there would be one DNS process that all the workers would be accessing. from the way it's described in the documentation it sounds as if it's already a separate process, so I was thinking that it was possible that if each ACL IP address is being put through a single DNS process, I could be running into contention on that process (and having to do name lookups for both IPv6 and then falling back to IPv4 would explain the severe performance hit far more than the difference between IPs being 128 bit values instead of 32 bit values) David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 04/25/2011 05:31 PM, da...@lang.hm wrote: > On Mon, 25 Apr 2011, da...@lang.hm wrote: >> On Mon, 25 Apr 2011, Alex Rousskov wrote: >>> On 04/14/2011 09:06 PM, da...@lang.hm wrote: >>> In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs >>> >>> There are pretty much no locks in the current official SMP code. This >>> will change as we start adding shared caches in a week or so, but even >>> then the ACLs will remain lock-free. There could be some internal >>> locking in the 3rd-party libraries used by ACLs (regex and such), but I >>> do not know much about them. >> >> what are the 3rd party libraries that I would be using? See "ldd squid". Here is a sample based on a randomly picked Squid: libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol Please note that I am not saying that any of these have problems in SMP environment. I am only saying that Squid itself does not lock anything runtime so if our suspect is SMP-related locks, they would have to reside elsewhere. The other possibility is that we should suspect something else, of course. IMHO, it is more likely to be something else: after all, Squid does not use threads, where such problems are expected. BTW, do you see more-or-less even load across CPU cores? If not, you may need a patch that we find useful on older Linux kernels. It is discussed in the "Will similar workers receive similar amount of work?" section of http://wiki.squid-cache.org/Features/SmpScale > one thought I had is that this could be locking on name lookups. how > hard would it be to create a quick patch that would bypass the name > lookups entirely and only do the lookups by IP. I did not realize your ACLs use DNS lookups. Squid internal DNS code does not have any runtime SMP locks. However, the presence of DNS lookups increases the number of suspects. A patch you propose does not sound difficult to me, but since I cannot contribute such a patch soon, it is probably better to test with ACLs that do not require any DNS lookups instead. > if that regains the speed and/or scalability it would point fingers > fairly conclusively at the DNS components. > > this is the only think that I can think of that should be shared between > multiple workers processing ACLs but it is _not_ currently shared from Squid point of view. Cheers, Alex.
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, da...@lang.hm wrote: On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? one thought I had is that this could be locking on name lookups. how hard would it be to create a quick patch that would bypass the name lookups entirely and only do the lookups by IP. if that regains the speed and/or scalability it would point fingers fairly conclusively at the DNS components. this is the only think that I can think of that should be shared between multiple workers processing ACLs David Lang
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On Mon, 25 Apr 2011, Alex Rousskov wrote: On 04/14/2011 09:06 PM, da...@lang.hm wrote: Ok, I finally got a chance to test 2.7STABLE9 it performs about the same as squid 3.0, possibly a little better. with my somewhat stripped down config (smaller regex patterns, replacing CIDR blocks and names that would need to be looked up in /etc/hosts with individual IP addresses) 2.7 gives ~4800 requests/sec 3.0 gives ~4600 requests/sec 3.2.0.6 with 1 worker gives ~1300 requests/sec 3.2.0.6 with 5 workers gives ~2800 requests/sec Glad you did not see a significant regression between v2.7 and v3.0. We have heard rather different stories. Every environment is different, and many lab tests are misguided, of course, but it is still good to hear positive reports. The difference between v3.2 and v3.0 is known and have been discussed on squid-dev. A few specific culprits are also known, but more need to be identified. We are working on identifying these performance bugs and reducing that difference. let me know if there are any tests that I can run that will help you. As for 1 versus 5 worker difference, it seems to be specific to your environment (as discussed below). the numbers for 3.0 are slightly better than what I was getting with the full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the last round of tests (with either the full or simplified ruleset) so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the ability to use multiple worker processes in 3.2 doesn't make up for this. the time taken seems to almost all be in the ACL avaluation as eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. If ACLs are the major culprit in your environment, then this is most likely not a problem in Squid source code. AFAIK, there are no locks or other synchronization primitives/overheads when it comes to Squid ACLs. The solution may lie in optimizing some 3rd-party libraries (used by ACLs) or in optimizing how they are used by Squid, depending on what ACLs you use. As far as Squid-specific code is concerned, you should see nearly linear ACL scale with the number of workers. given that my ACLs are IP/port matches or regex matches (and I've tested replacing the regex matches with IP matches with no significant change in performance), what components would be used. one theory is that even though I have IPv6 disabled on this build, the added space and more expensive checks needed to compare IPv6 addresses instead of IPv4 addresses accounts for the single worker drop of ~66%. that seems rather expensive, even though there are 293 http_access lines (and one of them uses external file contents in it's acls, so it's a total of ~2400 source/destination pairs, however due to the ability to shortcut the comparison the number of tests that need to be done should be <400) Yes, IPv6 is one of the known major performance regression culprits, but IPv6 ACLs should still scale linearly with the number of workers, AFAICT. Please note that I am not an ACL expert. I am just talking from the overall Squid SMP design point of view and from our testing/deployment experience point of view. that makes sense and is what I would have expected, but in my case (lots of ACLs) I am seeing a definante problem with more workers not completing more work, and beyond about 5 workers I am seeing the total work being completed drop. I can't think of any reason besides locking that this may be the case. In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. what are the 3rd party libraries that I would be using? David Lang HTH, Alex. On Wed, 13 Apr 2011, Marcos wrote: Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
On 04/14/2011 09:06 PM, da...@lang.hm wrote: > Ok, I finally got a chance to test 2.7STABLE9 > > it performs about the same as squid 3.0, possibly a little better. > > with my somewhat stripped down config (smaller regex patterns, replacing > CIDR blocks and names that would need to be looked up in /etc/hosts with > individual IP addresses) > > 2.7 gives ~4800 requests/sec > 3.0 gives ~4600 requests/sec > 3.2.0.6 with 1 worker gives ~1300 requests/sec > 3.2.0.6 with 5 workers gives ~2800 requests/sec Glad you did not see a significant regression between v2.7 and v3.0. We have heard rather different stories. Every environment is different, and many lab tests are misguided, of course, but it is still good to hear positive reports. The difference between v3.2 and v3.0 is known and have been discussed on squid-dev. A few specific culprits are also known, but more need to be identified. We are working on identifying these performance bugs and reducing that difference. As for 1 versus 5 worker difference, it seems to be specific to your environment (as discussed below). > the numbers for 3.0 are slightly better than what I was getting with the > full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I > got from the last round of tests (with either the full or simplified > ruleset) > > so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and > the ability to use multiple worker processes in 3.2 doesn't make up for > this. > > the time taken seems to almost all be in the ACL avaluation as > eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. If ACLs are the major culprit in your environment, then this is most likely not a problem in Squid source code. AFAIK, there are no locks or other synchronization primitives/overheads when it comes to Squid ACLs. The solution may lie in optimizing some 3rd-party libraries (used by ACLs) or in optimizing how they are used by Squid, depending on what ACLs you use. As far as Squid-specific code is concerned, you should see nearly linear ACL scale with the number of workers. > one theory is that even though I have IPv6 disabled on this build, the > added space and more expensive checks needed to compare IPv6 addresses > instead of IPv4 addresses accounts for the single worker drop of ~66%. > that seems rather expensive, even though there are 293 http_access lines > (and one of them uses external file contents in it's acls, so it's a > total of ~2400 source/destination pairs, however due to the ability to > shortcut the comparison the number of tests that need to be done should > be <400) Yes, IPv6 is one of the known major performance regression culprits, but IPv6 ACLs should still scale linearly with the number of workers, AFAICT. Please note that I am not an ACL expert. I am just talking from the overall Squid SMP design point of view and from our testing/deployment experience point of view. > In addition, there seems to be some sort of locking betwen the multiple > worker processes in 3.2 when checking the ACLs There are pretty much no locks in the current official SMP code. This will change as we start adding shared caches in a week or so, but even then the ACLs will remain lock-free. There could be some internal locking in the 3rd-party libraries used by ACLs (regex and such), but I do not know much about them. HTH, Alex. >> On Wed, 13 Apr 2011, Marcos wrote: >> >>> Hi David, >>> >>> could you run and publish your benchmark with squid 2.7 ??? >>> i'd like to know if is there any regression between 2.7 and 3.x series. >>> >>> thanks. >>> >>> Marcos >>> >>> >>> - Mensagem original >>> De: "da...@lang.hm" >>> Para: Amos Jeffries >>> Cc: squid-users@squid-cache.org; squid-...@squid-cache.org >>> Enviadas: S?bado, 9 de Abril de 2011 12:56:12 >>> Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues >>> >>> On Sat, 9 Apr 2011, Amos Jeffries wrote: >>> On 09/04/11 14:27, da...@lang.hm wrote: > A couple more things about the ACLs used in my test > > all of them are allow ACLs (no deny rules to worry about precidence > of) > except for a deny-all at the bottom > > the ACL line that permits the test source to the test destination has > zero overlap with the rest of the rules > > every rule has an IP based restriction (even the ones with > url_regex are > source -> URL regex) > > I moved the ACL that allows my test from the bottom of the ruleset to > the top and the resulting performance numbers were up as if the other > ACLs didn't exist. As such it is very clear that 3.2 is evaluating > every > rule. > > I changed one of the url_regex rules to just match one line rather > than > a file containing 307 lines to see if that made a difference, and it > made no significant difference. So this indicates to me that it's not > having to fully evaluate every rule (it's able to skip doing the regex >
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
ping, I haven't seen a response to this additional information that I sent out last week. squid 3.1 and 3.2 are a significant regression in performance from squid 2.7 or 3.0 David Lang On Thu, 14 Apr 2011, da...@lang.hm wrote: Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues Ok, I finally got a chance to test 2.7STABLE9 it performs about the same as squid 3.0, possibly a little better. with my somewhat stripped down config (smaller regex patterns, replacing CIDR blocks and names that would need to be looked up in /etc/hosts with individual IP addresses) 2.7 gives ~4800 requests/sec 3.0 gives ~4600 requests/sec 3.2.0.6 with 1 worker gives ~1300 requests/sec 3.2.0.6 with 5 workers gives ~2800 requests/sec the numbers for 3.0 are slightly better than what I was getting with the full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the last round of tests (with either the full or simplified ruleset) so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the ability to use multiple worker processes in 3.2 doesn't make up for this. the time taken seems to almost all be in the ACL avaluation as eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. one theory is that even though I have IPv6 disabled on this build, the added space and more expensive checks needed to compare IPv6 addresses instead of IPv4 addresses accounts for the single worker drop of ~66%. that seems rather expensive, even though there are 293 http_access lines (and one of them uses external file contents in it's acls, so it's a total of ~2400 source/destination pairs, however due to the ability to shortcut the comparison the number of tests that need to be done should be <400) In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs as the test with almost no ACLs scales close to 100% per worker while with the ACLs it scales much more slowly, and above 4-5 workers actually drops off dramatically (to the point where with 8 workers the throughput is down to about what you get with 1-2 workers) I don't see any conceptual reason why the ACL checks of the different worker threads should impact each other in any way, let alone in a way that limits scalability to ~4 workers before adding more workers is a net loss. David Lang On Wed, 13 Apr 2011, Marcos wrote: Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
Ok, I finally got a chance to test 2.7STABLE9 it performs about the same as squid 3.0, possibly a little better. with my somewhat stripped down config (smaller regex patterns, replacing CIDR blocks and names that would need to be looked up in /etc/hosts with individual IP addresses) 2.7 gives ~4800 requests/sec 3.0 gives ~4600 requests/sec 3.2.0.6 with 1 worker gives ~1300 requests/sec 3.2.0.6 with 5 workers gives ~2800 requests/sec the numbers for 3.0 are slightly better than what I was getting with the full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the last round of tests (with either the full or simplified ruleset) so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the ability to use multiple worker processes in 3.2 doesn't make up for this. the time taken seems to almost all be in the ACL avaluation as eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. one theory is that even though I have IPv6 disabled on this build, the added space and more expensive checks needed to compare IPv6 addresses instead of IPv4 addresses accounts for the single worker drop of ~66%. that seems rather expensive, even though there are 293 http_access lines (and one of them uses external file contents in it's acls, so it's a total of ~2400 source/destination pairs, however due to the ability to shortcut the comparison the number of tests that need to be done should be <400) In addition, there seems to be some sort of locking betwen the multiple worker processes in 3.2 when checking the ACLs as the test with almost no ACLs scales close to 100% per worker while with the ACLs it scales much more slowly, and above 4-5 workers actually drops off dramatically (to the point where with 8 workers the throughput is down to about what you get with 1-2 workers) I don't see any conceptual reason why the ACL checks of the different worker threads should impact each other in any way, let alone in a way that limits scalability to ~4 workers before adding more workers is a net loss. David Lang On Wed, 13 Apr 2011, Marcos wrote: Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If they did, the fact that one 3.2 process is about 1/3 the speed of a 3.0 process in checking the acls wouldn't matter nearly as much when it's so easy to get an 8+ core system. There you have the unknown. I think this is a fairly critical thing to figure out.
Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
sorry, haven't had time to do that yet. I will try and get this done today. David Lang On Wed, 13 Apr 2011, Marcos wrote: Date: Wed, 13 Apr 2011 04:11:09 -0700 (PDT) From: Marcos To: da...@lang.hm, Amos Jeffries Cc: squid-users@squid-cache.org, squid-...@squid-cache.org Subject: Res: [squid-users] squid 3.2.0.5 smp scaling issues Hi David, could you run and publish your benchmark with squid 2.7 ??? i'd like to know if is there any regression between 2.7 and 3.x series. thanks. Marcos - Mensagem original De: "da...@lang.hm" Para: Amos Jeffries Cc: squid-users@squid-cache.org; squid-...@squid-cache.org Enviadas: S?bado, 9 de Abril de 2011 12:56:12 Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues On Sat, 9 Apr 2011, Amos Jeffries wrote: On 09/04/11 14:27, da...@lang.hm wrote: A couple more things about the ACLs used in my test all of them are allow ACLs (no deny rules to worry about precidence of) except for a deny-all at the bottom the ACL line that permits the test source to the test destination has zero overlap with the rest of the rules every rule has an IP based restriction (even the ones with url_regex are source -> URL regex) I moved the ACL that allows my test from the bottom of the ruleset to the top and the resulting performance numbers were up as if the other ACLs didn't exist. As such it is very clear that 3.2 is evaluating every rule. I changed one of the url_regex rules to just match one line rather than a file containing 307 lines to see if that made a difference, and it made no significant difference. So this indicates to me that it's not having to fully evaluate every rule (it's able to skip doing the regex if the IP match doesn't work) I then changed all the acl lines that used hostnames to have IP addresses in them, and this also made no significant difference I then changed all subnet matches to single IP address (just nuked /## throughout the config file) and this also made no significant difference. Squid has always worked this way. It will *test* every rule from the top down to the one that matches. Also testing each line left-to-right until one fails or the whole line matches. so why are the address matches so expensive 3.0 and older IP address is a 32-bit comparison. 3.1 and newer IP address is a 128-bit comparison with memcmp(). If something like a word-wise comparison can be implemented faster than memcmp() we would welcome it. I wonder if there should be a different version that's used when IPv6 is disabled. this is a pretty large hit. if the data is aligned properly, on a 64 bit system this should still only be 2 compares. do you do any alignment on the data now? and as noted in the e-mail below, why do these checks not scale nicely with the number of worker processes? If they did, the fact that one 3.2 process is about 1/3 the speed of a 3.0 process in checking the acls wouldn't matter nearly as much when it's so easy to get an 8+ core system. There you have the unknown. I think this is a fairly critical thing to figure out. it seems to me that all accept/deny rules in a set should be able to be combined into a tree to make searching them very fast. so for example if you have accept 1 accept 2 deny 3 deny 4 accept 5 you need to create three trees (one with accept 1 and accept 2, one with deny3 and deny4, and one with accept 5) and then check each tree to see if you have a match. the types of match could be done in order of increasing cost, so if you The config file is specific structure configured by admin under guaranteed rules of operation for access lines (top-down, left-to-right, first-match-wins) to perform boolean-logic calculations using ACL sets. Sorting access line rules is not an option. Sorting ACL values and tree-forming them is already done (regex being the one exception AFAIK). Sorting position-wise on a single access line is also ruled out by interactions with deny_info, auth and external ACL types. It would seem that as long as you don't cross boundries between the different types, you should be able to optimize within a group. using my example above, you couldn't combine the 'accept 5' with any of the other accepts, but you could combine accept 1 and 2 and combine deny 3 and 4 togeather. now, I know that I don't fully understand all the possible ACL types, so this may not work for some of them, but I believe that a fairly common use case is to have either a lot of allow rules, or a lot of deny rules as a block (either a list of sites you are allowed to access, or a list of sites that are blocked), so an ability to optimize these use cases may be well worth it. have acl entries of type port, src, dst, and url regex, organize the tree so that you check ports first, then src, then dst, then only if all that matches do you need to do the regex. This would be very similar to the shortcut logic that you use today with a single rule where you bail out when you don't