Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2013-03-11 Thread Amos Jeffries

On 12/03/2013 8:11 a.m., paulm wrote:

Excusme David

What are the ab paramenters that use to test agains squid ?


-n for request count
-c for concurrency level

SMP in Squid shares a listening port so -c 1 will still test both 
workers. But the results are more interesting as you vary client count 
versus request count.


For worst-case traffic scenario test with a guaranteed MISS response, 
for best-case test with a small HIT response.


Other than that whatever you like. Using a FQDN you host yourself is polite.

Amos


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2013-03-11 Thread paulm
Excusme David

What are the ab paramenters that use to test agains squid ?

thnks, Paul 



--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/squid-3-2-0-5-smp-scaling-issues-tp3395333p4658947.html
Sent from the Squid - Users mailing list archive at Nabble.com.


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-05-04 Thread Amos Jeffries

On Wed, 4 May 2011 16:36:08 -0700 (PDT), da...@lang.hm wrote:

On Wed, 4 May 2011, Alex Rousskov wrote:


On 05/04/2011 12:49 PM, da...@lang.hm wrote:



IMHO, you can maximize your chances of getting free help by 
isolating

the problem better. For example, perhaps you can try to reproduce it
with different kinds of fast ACLs (the simpler the better!). This 
will

help clarify whether the problem is specific to IPv6, IP, or ACLs in
general. Test different number of ACLs: Does the problem happen only
when there number of simple ACLs is huge? Make the problem easier to
reproduce by posting configuration files (including Polygraph 
workloads

or options for some other benchmarking tool you use).
-
This is not a guarantee that somebody will jump and help you, but 
fixing

a well-triaged issue is often much easier.


that's why I'm speaking up. I just have not known what to test.

are there other types of ACLs that I should be testing?


We can't answer that without having seen your config file and which are 
in use now.


The list of all available ACL are at 
http://wiki.squid-cache.org/SquidFaq/SquidAcl and 
http://www.squid-cache.org/Doc/config/acl/




I'll setup some tests with differnet numbers of ACLs. since I've
already verified that the number of ACLs defined isn't the 
significant

factor, only the number tested before one succeds (by moving the ACL
that allows my access from the end of the file to the beginning of 
the

file, keeping everything else the same), I'll see if the slowdown
seems proportional to the number of rules, or if there is something
else going on.

any other types of testing I should do?


The above looks like a good benchmark *provided* all the ACLs have the 
same type with consistent content counts. Mixing types makes the result 
non-comparable with other tests.


If you have time (and want to), we kind of need that type of 
benchmarking done for each ACL type. Prioritising by popularity: src/dst 
by IP, port, domain and regex variants. Then proxy_auth, external (the 
"fake" helpers can help here). Then the others; ie browser, proto, 
method, header matching.


We know general fuzzy details like, for example, a port test is faster 
than a domain test. One with details presented up front by the client is 
also faster than one where a lookup is needed. But have no deeper info 
to say if a dstdomain test is faster or slower than a src (IP) test.


Way down my TODO list is the dream of micro-benchmarking the ACLs in 
their unit-tests.



Amos


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-05-04 Thread david

On Wed, 4 May 2011, Alex Rousskov wrote:


On 05/04/2011 12:49 PM, da...@lang.hm wrote:


I don't know how many developers are working on squid, so I don't knwo
if you are the only person who can do this sort of work or not.


I am sure there are others who can do this. The question is whether you
can quickly find somebody interested enough to spend their time on your
problem. In general, folks work on issues that are important to them or
to their customers. Most active developers donate a lot of free time,
but it still tends to revolve around issues they care about for one
reason or another. We all have to prioritize.


I do understand this.


do you think that I should join the squid-dev list?


I believe your messages are posted to squid-dev so you are not going to
reach a wider audience if you do. If you want to write Squid code,
joining is a good idea!


I don't really have the time to do coding on this project


IMHO, you can maximize your chances of getting free help by isolating
the problem better. For example, perhaps you can try to reproduce it
with different kinds of fast ACLs (the simpler the better!). This will
help clarify whether the problem is specific to IPv6, IP, or ACLs in
general. Test different number of ACLs: Does the problem happen only
when there number of simple ACLs is huge? Make the problem easier to
reproduce by posting configuration files (including Polygraph workloads
or options for some other benchmarking tool you use).

This is not a guarantee that somebody will jump and help you, but fixing
a well-triaged issue is often much easier.


that's why I'm speaking up. I just have not known what to test.

are there other types of ACLs that I should be testing?

I'll setup some tests with differnet numbers of ACLs. since I've already 
verified that the number of ACLs defined isn't the significant factor, 
only the number tested before one succeds (by moving the ACL that allows 
my access from the end of the file to the beginning of the file, keeping 
everything else the same), I'll see if the slowdown seems proportional to 
the number of rules, or if there is something else going on.


any other types of testing I should do?

David Lang



HTH,

Alex.



On Wed, 4 May 2011, Alex Rousskov wrote:


On 05/04/2011 11:41 AM, da...@lang.hm wrote:


anything new on this issue? (including any patches for me to test?)


If you mean the "ACLs do not scale well" issue, then I do not have any
free cycles to work on it right now.  I was happy to clarify the new SMP
architecture and suggest ways to triage the issue further. Let's hope
somebody else can volunteer to do the required legwork.

Alex.



On Mon, 25 Apr 2011, da...@lang.hm wrote:


Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT)
From: da...@lang.hm
To: Alex Rousskov 
Cc: Marcos , squid-users@squid-cache.org,
    squid-...@squid-cache.org
Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

On Mon, 25 Apr 2011, Alex Rousskov wrote:


On 04/25/2011 05:31 PM, da...@lang.hm wrote:

On Mon, 25 Apr 2011, da...@lang.hm wrote:

On Mon, 25 Apr 2011, Alex Rousskov wrote:

On 04/14/2011 09:06 PM, da...@lang.hm wrote:


In addition, there seems to be some sort of locking betwen the
multiple
worker processes in 3.2 when checking the ACLs


There are pretty much no locks in the current official SMP code.
This
will change as we start adding shared caches in a week or so, but
even
then the ACLs will remain lock-free. There could be some internal
locking in the 3rd-party libraries used by ACLs (regex and such),
but I
do not know much about them.


what are the 3rd party libraries that I would be using?


See "ldd squid". Here is a sample based on a randomly picked Squid:

   libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol

Please note that I am not saying that any of these have problems in
SMP
environment. I am only saying that Squid itself does not lock anything
runtime so if our suspect is SMP-related locks, they would have to
reside elsewhere. The other possibility is that we should suspect
something else, of course. IMHO, it is more likely to be something
else:
after all, Squid does not use threads, where such problems are
expected.




BTW, do you see more-or-less even load across CPU cores? If not,
you may
need a patch that we find useful on older Linux kernels. It is
discussed
in the "Will similar workers receive similar amount of work?"
section of
http://wiki.squid-cache.org/Features/SmpScale


the load is pretty even across all workers.

with the problems descripted on that page, I would expect uneven
utilization at low loads, but at high loads (with the workers busy
serviceing requests rather than waiting for new connections), I would
expect the work to even out (and the types of hacks described in that
section to end up costing performance, but not in a way that would
scale with the ACL processing load)


one thought I had is th

Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-05-04 Thread Amos Jeffries

On Wed, 4 May 2011 11:49:01 -0700 (PDT), da...@lang.hm wrote:

I don't know how many developers are working on squid, so I don't
knwo if you are the only person who can do this sort of work or not.


4 part-timers and a few others focused on specific areas.



do you think that I should join the squid-dev list?


I thought you had, if you are intending to follow this for long it 
could be a good idea anyway.
If you have any time to spare on tinkering with optimizations even 
better.


Amos



Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-05-04 Thread Alex Rousskov
On 05/04/2011 12:49 PM, da...@lang.hm wrote:

> I don't know how many developers are working on squid, so I don't knwo
> if you are the only person who can do this sort of work or not.

I am sure there are others who can do this. The question is whether you
can quickly find somebody interested enough to spend their time on your
problem. In general, folks work on issues that are important to them or
to their customers. Most active developers donate a lot of free time,
but it still tends to revolve around issues they care about for one
reason or another. We all have to prioritize.


> do you think that I should join the squid-dev list?

I believe your messages are posted to squid-dev so you are not going to
reach a wider audience if you do. If you want to write Squid code,
joining is a good idea!


IMHO, you can maximize your chances of getting free help by isolating
the problem better. For example, perhaps you can try to reproduce it
with different kinds of fast ACLs (the simpler the better!). This will
help clarify whether the problem is specific to IPv6, IP, or ACLs in
general. Test different number of ACLs: Does the problem happen only
when there number of simple ACLs is huge? Make the problem easier to
reproduce by posting configuration files (including Polygraph workloads
or options for some other benchmarking tool you use).

This is not a guarantee that somebody will jump and help you, but fixing
a well-triaged issue is often much easier.


HTH,

Alex.


> On Wed, 4 May 2011, Alex Rousskov wrote:
> 
>> On 05/04/2011 11:41 AM, da...@lang.hm wrote:
>>
>>> anything new on this issue? (including any patches for me to test?)
>>
>> If you mean the "ACLs do not scale well" issue, then I do not have any
>> free cycles to work on it right now.  I was happy to clarify the new SMP
>> architecture and suggest ways to triage the issue further. Let's hope
>> somebody else can volunteer to do the required legwork.
>>
>> Alex.
>>
>>
>>> On Mon, 25 Apr 2011, da...@lang.hm wrote:
>>>
>>>> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT)
>>>> From: da...@lang.hm
>>>> To: Alex Rousskov 
>>>> Cc: Marcos , squid-users@squid-cache.org,
>>>> squid-...@squid-cache.org
>>>> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
>>>>
>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>>
>>>>> On 04/25/2011 05:31 PM, da...@lang.hm wrote:
>>>>>> On Mon, 25 Apr 2011, da...@lang.hm wrote:
>>>>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>>>>>> On 04/14/2011 09:06 PM, da...@lang.hm wrote:
>>>>>>>>
>>>>>>>>> In addition, there seems to be some sort of locking betwen the
>>>>>>>>> multiple
>>>>>>>>> worker processes in 3.2 when checking the ACLs
>>>>>>>>
>>>>>>>> There are pretty much no locks in the current official SMP code.
>>>>>>>> This
>>>>>>>> will change as we start adding shared caches in a week or so, but
>>>>>>>> even
>>>>>>>> then the ACLs will remain lock-free. There could be some internal
>>>>>>>> locking in the 3rd-party libraries used by ACLs (regex and such),
>>>>>>>> but I
>>>>>>>> do not know much about them.
>>>>>>>
>>>>>>> what are the 3rd party libraries that I would be using?
>>>>>
>>>>> See "ldd squid". Here is a sample based on a randomly picked Squid:
>>>>>
>>>>>libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol
>>>>>
>>>>> Please note that I am not saying that any of these have problems in
>>>>> SMP
>>>>> environment. I am only saying that Squid itself does not lock anything
>>>>> runtime so if our suspect is SMP-related locks, they would have to
>>>>> reside elsewhere. The other possibility is that we should suspect
>>>>> something else, of course. IMHO, it is more likely to be something
>>>>> else:
>>>>> after all, Squid does not use threads, where such problems are
>>>>> expected.
>>>>
>>>>
>>>>> BTW, do you see more-or-less even load across CPU cores? If not,
>>>>> you may
>>>>> need a patch that we find useful on older Linux kernels. It is
>>>>> discussed
>>>>> in the "Will similar 

Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-05-04 Thread david
I don't know how many developers are working on squid, so I don't knwo if 
you are the only person who can do this sort of work or not.


do you think that I should join the squid-dev list?

David Lang

On Wed, 4 May 2011, Alex Rousskov wrote:


On 05/04/2011 11:41 AM, da...@lang.hm wrote:


anything new on this issue? (including any patches for me to test?)


If you mean the "ACLs do not scale well" issue, then I do not have any
free cycles to work on it right now.  I was happy to clarify the new SMP
architecture and suggest ways to triage the issue further. Let's hope
somebody else can volunteer to do the required legwork.

Alex.



On Mon, 25 Apr 2011, da...@lang.hm wrote:


Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT)
From: da...@lang.hm
To: Alex Rousskov 
Cc: Marcos , squid-users@squid-cache.org,
squid-...@squid-cache.org
Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

On Mon, 25 Apr 2011, Alex Rousskov wrote:


On 04/25/2011 05:31 PM, da...@lang.hm wrote:

On Mon, 25 Apr 2011, da...@lang.hm wrote:

On Mon, 25 Apr 2011, Alex Rousskov wrote:

On 04/14/2011 09:06 PM, da...@lang.hm wrote:


In addition, there seems to be some sort of locking betwen the
multiple
worker processes in 3.2 when checking the ACLs


There are pretty much no locks in the current official SMP code. This
will change as we start adding shared caches in a week or so, but
even
then the ACLs will remain lock-free. There could be some internal
locking in the 3rd-party libraries used by ACLs (regex and such),
but I
do not know much about them.


what are the 3rd party libraries that I would be using?


See "ldd squid". Here is a sample based on a randomly picked Squid:

   libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol

Please note that I am not saying that any of these have problems in SMP
environment. I am only saying that Squid itself does not lock anything
runtime so if our suspect is SMP-related locks, they would have to
reside elsewhere. The other possibility is that we should suspect
something else, of course. IMHO, it is more likely to be something else:
after all, Squid does not use threads, where such problems are expected.




BTW, do you see more-or-less even load across CPU cores? If not, you may
need a patch that we find useful on older Linux kernels. It is discussed
in the "Will similar workers receive similar amount of work?" section of
http://wiki.squid-cache.org/Features/SmpScale


the load is pretty even across all workers.

with the problems descripted on that page, I would expect uneven
utilization at low loads, but at high loads (with the workers busy
serviceing requests rather than waiting for new connections), I would
expect the work to even out (and the types of hacks described in that
section to end up costing performance, but not in a way that would
scale with the ACL processing load)


one thought I had is that this could be locking on name lookups. how
hard would it be to create a quick patch that would bypass the name
lookups entirely and only do the lookups by IP.


I did not realize your ACLs use DNS lookups. Squid internal DNS code
does not have any runtime SMP locks. However, the presence of DNS
lookups increases the number of suspects.


they don't, everything in my test environment is by IP. But I've seen
other software that still runs everything through name lookups, even
if what's presented to the software (both in what's requested and in
the ACLs) is all done by IPs. It's a easy way to bullet-proof the
input (if it's a name it gets resolved, if it's an IP, the IP comes
back as-is, and it works for IPv4 and IPv6, no need to have logic that
looks at the value and tries to figure out if the user intended to
type a name or an IP). I don't know how squid is working internally
(it's a pretty large codebase, and I haven't tried to really dive into
it) so I don't know if squid does this or not.


A patch you propose does not sound difficult to me, but since I cannot
contribute such a patch soon, it is probably better to test with ACLs
that do not require any DNS lookups instead.



if that regains the speed and/or scalability it would point fingers
fairly conclusively at the DNS components.

this is the only think that I can think of that should be shared
between
multiple workers processing ACLs


but it is _not_ currently shared from Squid point of view.


Ok, I was assuming from the description of things that there would be
one DNS process that all the workers would be accessing. from the way
it's described in the documentation it sounds as if it's already a
separate process, so I was thinking that it was possible that if each
ACL IP address is being put through a single DNS process, I could be
running into contention on that process (and having to do name lookups
for both IPv6 and then falling back to IPv4 would explain the severe
performance hit far more than the difference between IPs being 128 bit
values instead of 32 bit values)

David Lang







Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-05-04 Thread Alex Rousskov
On 05/04/2011 11:41 AM, da...@lang.hm wrote:

> anything new on this issue? (including any patches for me to test?)

If you mean the "ACLs do not scale well" issue, then I do not have any
free cycles to work on it right now.  I was happy to clarify the new SMP
architecture and suggest ways to triage the issue further. Let's hope
somebody else can volunteer to do the required legwork.

Alex.


> On Mon, 25 Apr 2011, da...@lang.hm wrote:
> 
>> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT)
>> From: da...@lang.hm
>> To: Alex Rousskov 
>> Cc: Marcos , squid-users@squid-cache.org,
>> squid-...@squid-cache.org
>> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
>>
>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>
>>> On 04/25/2011 05:31 PM, da...@lang.hm wrote:
>>>> On Mon, 25 Apr 2011, da...@lang.hm wrote:
>>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>>>> On 04/14/2011 09:06 PM, da...@lang.hm wrote:
>>>>>>
>>>>>>> In addition, there seems to be some sort of locking betwen the
>>>>>>> multiple
>>>>>>> worker processes in 3.2 when checking the ACLs
>>>>>>
>>>>>> There are pretty much no locks in the current official SMP code. This
>>>>>> will change as we start adding shared caches in a week or so, but
>>>>>> even
>>>>>> then the ACLs will remain lock-free. There could be some internal
>>>>>> locking in the 3rd-party libraries used by ACLs (regex and such),
>>>>>> but I
>>>>>> do not know much about them.
>>>>>
>>>>> what are the 3rd party libraries that I would be using?
>>>
>>> See "ldd squid". Here is a sample based on a randomly picked Squid:
>>>
>>>libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol
>>>
>>> Please note that I am not saying that any of these have problems in SMP
>>> environment. I am only saying that Squid itself does not lock anything
>>> runtime so if our suspect is SMP-related locks, they would have to
>>> reside elsewhere. The other possibility is that we should suspect
>>> something else, of course. IMHO, it is more likely to be something else:
>>> after all, Squid does not use threads, where such problems are expected.
>>
>>
>>> BTW, do you see more-or-less even load across CPU cores? If not, you may
>>> need a patch that we find useful on older Linux kernels. It is discussed
>>> in the "Will similar workers receive similar amount of work?" section of
>>> http://wiki.squid-cache.org/Features/SmpScale
>>
>> the load is pretty even across all workers.
>>
>> with the problems descripted on that page, I would expect uneven
>> utilization at low loads, but at high loads (with the workers busy
>> serviceing requests rather than waiting for new connections), I would
>> expect the work to even out (and the types of hacks described in that
>> section to end up costing performance, but not in a way that would
>> scale with the ACL processing load)
>>
>>>> one thought I had is that this could be locking on name lookups. how
>>>> hard would it be to create a quick patch that would bypass the name
>>>> lookups entirely and only do the lookups by IP.
>>>
>>> I did not realize your ACLs use DNS lookups. Squid internal DNS code
>>> does not have any runtime SMP locks. However, the presence of DNS
>>> lookups increases the number of suspects.
>>
>> they don't, everything in my test environment is by IP. But I've seen
>> other software that still runs everything through name lookups, even
>> if what's presented to the software (both in what's requested and in
>> the ACLs) is all done by IPs. It's a easy way to bullet-proof the
>> input (if it's a name it gets resolved, if it's an IP, the IP comes
>> back as-is, and it works for IPv4 and IPv6, no need to have logic that
>> looks at the value and tries to figure out if the user intended to
>> type a name or an IP). I don't know how squid is working internally
>> (it's a pretty large codebase, and I haven't tried to really dive into
>> it) so I don't know if squid does this or not.
>>
>>> A patch you propose does not sound difficult to me, but since I cannot
>>> contribute such a patch soon, it is probably better to test with ACLs
>>> that do not require any DNS lookups instead.
>>>
&

Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-05-04 Thread david

ping,

anything new on this issue? (including any patches for me to test?)

David Lang

On Mon, 25 Apr 2011, da...@lang.hm wrote:


Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT)
From: da...@lang.hm
To: Alex Rousskov 
Cc: Marcos , squid-users@squid-cache.org,
squid-...@squid-cache.org
Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

On Mon, 25 Apr 2011, Alex Rousskov wrote:


On 04/25/2011 05:31 PM, da...@lang.hm wrote:

On Mon, 25 Apr 2011, da...@lang.hm wrote:

On Mon, 25 Apr 2011, Alex Rousskov wrote:

On 04/14/2011 09:06 PM, da...@lang.hm wrote:


In addition, there seems to be some sort of locking betwen the multiple
worker processes in 3.2 when checking the ACLs


There are pretty much no locks in the current official SMP code. This
will change as we start adding shared caches in a week or so, but even
then the ACLs will remain lock-free. There could be some internal
locking in the 3rd-party libraries used by ACLs (regex and such), but I
do not know much about them.


what are the 3rd party libraries that I would be using?


See "ldd squid". Here is a sample based on a randomly picked Squid:

   libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol

Please note that I am not saying that any of these have problems in SMP
environment. I am only saying that Squid itself does not lock anything
runtime so if our suspect is SMP-related locks, they would have to
reside elsewhere. The other possibility is that we should suspect
something else, of course. IMHO, it is more likely to be something else:
after all, Squid does not use threads, where such problems are expected.




BTW, do you see more-or-less even load across CPU cores? If not, you may
need a patch that we find useful on older Linux kernels. It is discussed
in the "Will similar workers receive similar amount of work?" section of
http://wiki.squid-cache.org/Features/SmpScale


the load is pretty even across all workers.

with the problems descripted on that page, I would expect uneven utilization 
at low loads, but at high loads (with the workers busy serviceing requests 
rather than waiting for new connections), I would expect the work to even out 
(and the types of hacks described in that section to end up costing 
performance, but not in a way that would scale with the ACL processing load)



one thought I had is that this could be locking on name lookups. how
hard would it be to create a quick patch that would bypass the name
lookups entirely and only do the lookups by IP.


I did not realize your ACLs use DNS lookups. Squid internal DNS code
does not have any runtime SMP locks. However, the presence of DNS
lookups increases the number of suspects.


they don't, everything in my test environment is by IP. But I've seen other 
software that still runs everything through name lookups, even if what's 
presented to the software (both in what's requested and in the ACLs) is all 
done by IPs. It's a easy way to bullet-proof the input (if it's a name it 
gets resolved, if it's an IP, the IP comes back as-is, and it works for IPv4 
and IPv6, no need to have logic that looks at the value and tries to figure 
out if the user intended to type a name or an IP). I don't know how squid is 
working internally (it's a pretty large codebase, and I haven't tried to 
really dive into it) so I don't know if squid does this or not.



A patch you propose does not sound difficult to me, but since I cannot
contribute such a patch soon, it is probably better to test with ACLs
that do not require any DNS lookups instead.



if that regains the speed and/or scalability it would point fingers
fairly conclusively at the DNS components.

this is the only think that I can think of that should be shared between
multiple workers processing ACLs


but it is _not_ currently shared from Squid point of view.


Ok, I was assuming from the description of things that there would be one DNS 
process that all the workers would be accessing. from the way it's described 
in the documentation it sounds as if it's already a separate process, so I 
was thinking that it was possible that if each ACL IP address is being put 
through a single DNS process, I could be running into contention on that 
process (and having to do name lookups for both IPv6 and then falling back to 
IPv4 would explain the severe performance hit far more than the difference 
between IPs being 128 bit values instead of 32 bit values)


David Lang




Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-25 Thread david

On Mon, 25 Apr 2011, Alex Rousskov wrote:


On 04/25/2011 06:14 PM, da...@lang.hm wrote:

if that regains the speed and/or scalability it would point fingers
fairly conclusively at the DNS components.

this is the only think that I can think of that should be shared between
multiple workers processing ACLs


but it is _not_ currently shared from Squid point of view.


Ok, I was assuming from the description of things that there would be
one DNS process that all the workers would be accessing. from the way
it's described in the documentation it sounds as if it's already a
separate process


I would like to fix that documentation, but I cannot find what phrase
led you to the above conclusion. The SmpScale wiki page says:


Currently, Squid workers do not share and do not synchronize other
resources or services, including:

* DNS caches (ipcache and fqdncache);


So that seems to be correct and clear. Which documentation are you
referring to?


ahh, I missed that, I was going by the description of the config options 
that configure and disable the DNS cache (they don't say anything about 
the SMP mode, but I read them to imply that the squid-internal DNS cache 
was a separate thread/proccess)


David Lang


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-25 Thread Alex Rousskov
On 04/25/2011 06:14 PM, da...@lang.hm wrote:
>>> if that regains the speed and/or scalability it would point fingers
>>> fairly conclusively at the DNS components.
>>>
>>> this is the only think that I can think of that should be shared between
>>> multiple workers processing ACLs
>>
>> but it is _not_ currently shared from Squid point of view.
> 
> Ok, I was assuming from the description of things that there would be
> one DNS process that all the workers would be accessing. from the way
> it's described in the documentation it sounds as if it's already a
> separate process

I would like to fix that documentation, but I cannot find what phrase
led you to the above conclusion. The SmpScale wiki page says:

> Currently, Squid workers do not share and do not synchronize other
> resources or services, including:
> 
> * DNS caches (ipcache and fqdncache);

So that seems to be correct and clear. Which documentation are you
referring to?


Thank you,

Alex.


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-25 Thread david

On Mon, 25 Apr 2011, Alex Rousskov wrote:


On 04/25/2011 05:31 PM, da...@lang.hm wrote:

On Mon, 25 Apr 2011, da...@lang.hm wrote:

On Mon, 25 Apr 2011, Alex Rousskov wrote:

On 04/14/2011 09:06 PM, da...@lang.hm wrote:


In addition, there seems to be some sort of locking betwen the multiple
worker processes in 3.2 when checking the ACLs


There are pretty much no locks in the current official SMP code. This
will change as we start adding shared caches in a week or so, but even
then the ACLs will remain lock-free. There could be some internal
locking in the 3rd-party libraries used by ACLs (regex and such), but I
do not know much about them.


what are the 3rd party libraries that I would be using?


See "ldd squid". Here is a sample based on a randomly picked Squid:

   libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol

Please note that I am not saying that any of these have problems in SMP
environment. I am only saying that Squid itself does not lock anything
runtime so if our suspect is SMP-related locks, they would have to
reside elsewhere. The other possibility is that we should suspect
something else, of course. IMHO, it is more likely to be something else:
after all, Squid does not use threads, where such problems are expected.




BTW, do you see more-or-less even load across CPU cores? If not, you may
need a patch that we find useful on older Linux kernels. It is discussed
in the "Will similar workers receive similar amount of work?" section of
http://wiki.squid-cache.org/Features/SmpScale


the load is pretty even across all workers.

with the problems descripted on that page, I would expect uneven 
utilization at low loads, but at high loads (with the workers busy 
serviceing requests rather than waiting for new connections), I would 
expect the work to even out (and the types of hacks described in that 
section to end up costing performance, but not in a way that would scale 
with the ACL processing load)



one thought I had is that this could be locking on name lookups. how
hard would it be to create a quick patch that would bypass the name
lookups entirely and only do the lookups by IP.


I did not realize your ACLs use DNS lookups. Squid internal DNS code
does not have any runtime SMP locks. However, the presence of DNS
lookups increases the number of suspects.


they don't, everything in my test environment is by IP. But I've seen 
other software that still runs everything through name lookups, even if 
what's presented to the software (both in what's requested and in the 
ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if 
it's a name it gets resolved, if it's an IP, the IP comes back as-is, and 
it works for IPv4 and IPv6, no need to have logic that looks at the value 
and tries to figure out if the user intended to type a name or an IP). I 
don't know how squid is working internally (it's a pretty large codebase, 
and I haven't tried to really dive into it) so I don't know if squid does 
this or not.



A patch you propose does not sound difficult to me, but since I cannot
contribute such a patch soon, it is probably better to test with ACLs
that do not require any DNS lookups instead.



if that regains the speed and/or scalability it would point fingers
fairly conclusively at the DNS components.

this is the only think that I can think of that should be shared between
multiple workers processing ACLs


but it is _not_ currently shared from Squid point of view.


Ok, I was assuming from the description of things that there would be one 
DNS process that all the workers would be accessing. from the way it's 
described in the documentation it sounds as if it's already a separate 
process, so I was thinking that it was possible that if each ACL IP 
address is being put through a single DNS process, I could be running into 
contention on that process (and having to do name lookups for both IPv6 
and then falling back to IPv4 would explain the severe performance hit far 
more than the difference between IPs being 128 bit values instead of 32 
bit values)


David Lang



Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-25 Thread Alex Rousskov
On 04/25/2011 05:31 PM, da...@lang.hm wrote:
> On Mon, 25 Apr 2011, da...@lang.hm wrote: 
>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>> On 04/14/2011 09:06 PM, da...@lang.hm wrote:
>>>
 In addition, there seems to be some sort of locking betwen the multiple
 worker processes in 3.2 when checking the ACLs
>>>
>>> There are pretty much no locks in the current official SMP code. This
>>> will change as we start adding shared caches in a week or so, but even
>>> then the ACLs will remain lock-free. There could be some internal
>>> locking in the 3rd-party libraries used by ACLs (regex and such), but I
>>> do not know much about them.
>>
>> what are the 3rd party libraries that I would be using?

See "ldd squid". Here is a sample based on a randomly picked Squid:

libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol

Please note that I am not saying that any of these have problems in SMP
environment. I am only saying that Squid itself does not lock anything
runtime so if our suspect is SMP-related locks, they would have to
reside elsewhere. The other possibility is that we should suspect
something else, of course. IMHO, it is more likely to be something else:
after all, Squid does not use threads, where such problems are expected.

BTW, do you see more-or-less even load across CPU cores? If not, you may
need a patch that we find useful on older Linux kernels. It is discussed
in the "Will similar workers receive similar amount of work?" section of
http://wiki.squid-cache.org/Features/SmpScale


> one thought I had is that this could be locking on name lookups. how
> hard would it be to create a quick patch that would bypass the name
> lookups entirely and only do the lookups by IP.

I did not realize your ACLs use DNS lookups. Squid internal DNS code
does not have any runtime SMP locks. However, the presence of DNS
lookups increases the number of suspects.

A patch you propose does not sound difficult to me, but since I cannot
contribute such a patch soon, it is probably better to test with ACLs
that do not require any DNS lookups instead.


> if that regains the speed and/or scalability it would point fingers
> fairly conclusively at the DNS components.
> 
> this is the only think that I can think of that should be shared between
> multiple workers processing ACLs

but it is _not_ currently shared from Squid point of view.


Cheers,

Alex.


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-25 Thread david

On Mon, 25 Apr 2011, da...@lang.hm wrote:


On Mon, 25 Apr 2011, Alex Rousskov wrote:


On 04/14/2011 09:06 PM, da...@lang.hm wrote:


In addition, there seems to be some sort of locking betwen the multiple
worker processes in 3.2 when checking the ACLs


There are pretty much no locks in the current official SMP code. This
will change as we start adding shared caches in a week or so, but even
then the ACLs will remain lock-free. There could be some internal
locking in the 3rd-party libraries used by ACLs (regex and such), but I
do not know much about them.


what are the 3rd party libraries that I would be using?


one thought I had is that this could be locking on name lookups. how hard 
would it be to create a quick patch that would bypass the name lookups 
entirely and only do the lookups by IP.


if that regains the speed and/or scalability it would point fingers fairly 
conclusively at the DNS components.


this is the only think that I can think of that should be shared between 
multiple workers processing ACLs


David Lang


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-25 Thread david

On Mon, 25 Apr 2011, Alex Rousskov wrote:


On 04/14/2011 09:06 PM, da...@lang.hm wrote:

Ok, I finally got a chance to test 2.7STABLE9

it performs about the same as squid 3.0, possibly a little better.

with my somewhat stripped down config (smaller regex patterns, replacing
CIDR blocks and names that would need to be looked up in /etc/hosts with
individual IP addresses)

2.7 gives ~4800 requests/sec
3.0 gives ~4600 requests/sec
3.2.0.6 with 1 worker gives ~1300 requests/sec
3.2.0.6 with 5 workers gives ~2800 requests/sec


Glad you did not see a significant regression between v2.7 and v3.0. We
have heard rather different stories. Every environment is different, and
many lab tests are misguided, of course, but it is still good to hear
positive reports.

The difference between v3.2 and v3.0 is known and have been discussed on
squid-dev. A few specific culprits are also known, but more need to be
identified. We are working on identifying these performance bugs and
reducing that difference.


let me know if there are any tests that I can run that will help you.


As for 1 versus 5 worker difference, it seems to be specific to your
environment (as discussed below).



the numbers for 3.0 are slightly better than what I was getting with the
full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I
got from the last round of tests (with either the full or simplified
ruleset)

so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and
the ability to use multiple worker processes in 3.2 doesn't make up for
this.

the time taken seems to almost all be in the ACL avaluation as
eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec.


If ACLs are the major culprit in your environment, then this is most
likely not a problem in Squid source code. AFAIK, there are no locks or
other synchronization primitives/overheads when it comes to Squid ACLs.
The solution may lie in optimizing some 3rd-party libraries (used by
ACLs) or in optimizing how they are used by Squid, depending on what
ACLs you use. As far as Squid-specific code is concerned, you should see
nearly linear ACL scale with the number of workers.


given that my ACLs are IP/port matches or regex matches (and I've tested 
replacing the regex matches with IP matches with no significant change in 
performance), what components would be used.





one theory is that even though I have IPv6 disabled on this build, the
added space and more expensive checks needed to compare IPv6 addresses
instead of IPv4 addresses accounts for the single worker drop of ~66%.
that seems rather expensive, even though there are 293 http_access lines
(and one of them uses external file contents in it's acls, so it's a
total of ~2400 source/destination pairs, however due to the ability to
shortcut the comparison the number of tests that need to be done should
be <400)


Yes, IPv6 is one of the known major performance regression culprits, but
IPv6 ACLs should still scale linearly with the number of workers, AFAICT.

Please note that I am not an ACL expert. I am just talking from the
overall Squid SMP design point of view and from our testing/deployment
experience point of view.


that makes sense and is what I would have expected, but in my case (lots 
of ACLs) I am seeing a definante problem with more workers not completing 
more work, and beyond about 5 workers I am seeing the total work being 
completed drop. I can't think of any reason besides locking that this may 
be the case.



In addition, there seems to be some sort of locking betwen the multiple
worker processes in 3.2 when checking the ACLs


There are pretty much no locks in the current official SMP code. This
will change as we start adding shared caches in a week or so, but even
then the ACLs will remain lock-free. There could be some internal
locking in the 3rd-party libraries used by ACLs (regex and such), but I
do not know much about them.


what are the 3rd party libraries that I would be using?

David Lang



HTH,

Alex.



On Wed, 13 Apr 2011, Marcos wrote:


Hi David,

could you run and publish your benchmark with squid 2.7 ???
i'd like to know if is there any regression between 2.7 and 3.x series.

thanks.

Marcos


- Mensagem original 
De: "da...@lang.hm" 
Para: Amos Jeffries 
Cc: squid-users@squid-cache.org; squid-...@squid-cache.org
Enviadas: S?bado, 9 de Abril de 2011 12:56:12
Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues

On Sat, 9 Apr 2011, Amos Jeffries wrote:


On 09/04/11 14:27, da...@lang.hm wrote:

A couple more things about the ACLs used in my test

all of them are allow ACLs (no deny rules to worry about precidence
of)
except for a deny-all at the bottom

the ACL line that permits the test source to the test destination has
zero overlap with the rest of the rules

every rule has an IP based restriction (even the ones with
url_regex are
source -> URL regex)

I moved the ACL that allows my test from the bottom of the ruleset to
the top and 

Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-25 Thread Alex Rousskov
On 04/14/2011 09:06 PM, da...@lang.hm wrote:
> Ok, I finally got a chance to test 2.7STABLE9
> 
> it performs about the same as squid 3.0, possibly a little better.
> 
> with my somewhat stripped down config (smaller regex patterns, replacing
> CIDR blocks and names that would need to be looked up in /etc/hosts with
> individual IP addresses)
> 
> 2.7 gives ~4800 requests/sec
> 3.0 gives ~4600 requests/sec
> 3.2.0.6 with 1 worker gives ~1300 requests/sec
> 3.2.0.6 with 5 workers gives ~2800 requests/sec

Glad you did not see a significant regression between v2.7 and v3.0. We
have heard rather different stories. Every environment is different, and
many lab tests are misguided, of course, but it is still good to hear
positive reports.

The difference between v3.2 and v3.0 is known and have been discussed on
squid-dev. A few specific culprits are also known, but more need to be
identified. We are working on identifying these performance bugs and
reducing that difference.

As for 1 versus 5 worker difference, it seems to be specific to your
environment (as discussed below).


> the numbers for 3.0 are slightly better than what I was getting with the
> full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I
> got from the last round of tests (with either the full or simplified
> ruleset)
> 
> so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and
> the ability to use multiple worker processes in 3.2 doesn't make up for
> this.
> 
> the time taken seems to almost all be in the ACL avaluation as
> eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec.

If ACLs are the major culprit in your environment, then this is most
likely not a problem in Squid source code. AFAIK, there are no locks or
other synchronization primitives/overheads when it comes to Squid ACLs.
The solution may lie in optimizing some 3rd-party libraries (used by
ACLs) or in optimizing how they are used by Squid, depending on what
ACLs you use. As far as Squid-specific code is concerned, you should see
nearly linear ACL scale with the number of workers.


> one theory is that even though I have IPv6 disabled on this build, the
> added space and more expensive checks needed to compare IPv6 addresses
> instead of IPv4 addresses accounts for the single worker drop of ~66%.
> that seems rather expensive, even though there are 293 http_access lines
> (and one of them uses external file contents in it's acls, so it's a
> total of ~2400 source/destination pairs, however due to the ability to
> shortcut the comparison the number of tests that need to be done should
> be <400)

Yes, IPv6 is one of the known major performance regression culprits, but
IPv6 ACLs should still scale linearly with the number of workers, AFAICT.

Please note that I am not an ACL expert. I am just talking from the
overall Squid SMP design point of view and from our testing/deployment
experience point of view.


> In addition, there seems to be some sort of locking betwen the multiple
> worker processes in 3.2 when checking the ACLs

There are pretty much no locks in the current official SMP code. This
will change as we start adding shared caches in a week or so, but even
then the ACLs will remain lock-free. There could be some internal
locking in the 3rd-party libraries used by ACLs (regex and such), but I
do not know much about them.


HTH,

Alex.


>> On Wed, 13 Apr 2011, Marcos wrote:
>>
>>> Hi David,
>>>
>>> could you run and publish your benchmark with squid 2.7 ???
>>> i'd like to know if is there any regression between 2.7 and 3.x series.
>>>
>>> thanks.
>>>
>>> Marcos
>>>
>>>
>>> - Mensagem original 
>>> De: "da...@lang.hm" 
>>> Para: Amos Jeffries 
>>> Cc: squid-users@squid-cache.org; squid-...@squid-cache.org
>>> Enviadas: S?bado, 9 de Abril de 2011 12:56:12
>>> Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues
>>>
>>> On Sat, 9 Apr 2011, Amos Jeffries wrote:
>>>
 On 09/04/11 14:27, da...@lang.hm wrote:
> A couple more things about the ACLs used in my test
>
> all of them are allow ACLs (no deny rules to worry about precidence
> of)
> except for a deny-all at the bottom
>
> the ACL line that permits the test source to the test destination has
> zero overlap with the rest of the rules
>
> every rule has an IP based restriction (even the ones with
> url_regex are
> source -> URL regex)
>
> I moved the ACL that allows my test from the bottom of the ruleset to
> the top and the resulting performance numbers were up as if the other
> ACLs didn't exist. As such it is very clear that 3.2 is evaluating
> every
> rule.
>
> I changed one of the url_regex rules to just match one line rather
> than
> a file containing 307 lines to see if that made a difference, and it
> made no significant difference. So this indicates to me that it's not
> having to fully evaluate every rule (it's able to skip doing the regex
> 

Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-22 Thread david
ping, I haven't seen a response to this additional information that I sent 
out last week.


squid 3.1 and 3.2 are a significant regression in performance from squid 
2.7 or 3.0


David Lang

On Thu, 14 Apr 2011, da...@lang.hm wrote:


Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

Ok, I finally got a chance to test 2.7STABLE9

it performs about the same as squid 3.0, possibly a little better.

with my somewhat stripped down config (smaller regex patterns, replacing CIDR 
blocks and names that would need to be looked up in /etc/hosts with 
individual IP addresses)


2.7 gives ~4800 requests/sec
3.0 gives ~4600 requests/sec
3.2.0.6 with 1 worker gives ~1300 requests/sec
3.2.0.6 with 5 workers gives ~2800 requests/sec

the numbers for 3.0 are slightly better than what I was getting with the full 
ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from 
the last round of tests (with either the full or simplified ruleset)


so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the 
ability to use multiple worker processes in 3.2 doesn't make up for this.


the time taken seems to almost all be in the ACL avaluation as eliminating 
all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec.


one theory is that even though I have IPv6 disabled on this build, the added 
space and more expensive checks needed to compare IPv6 addresses instead of 
IPv4 addresses accounts for the single worker drop of ~66%. that seems rather 
expensive, even though there are 293 http_access lines (and one of them uses 
external file contents in it's acls, so it's a total of ~2400 
source/destination pairs, however due to the ability to shortcut the 
comparison the number of tests that need to be done should be <400)




In addition, there seems to be some sort of locking betwen the multiple 
worker processes in 3.2 when checking the ACLs as the test with almost no 
ACLs scales close to 100% per worker while with the ACLs it scales much more 
slowly, and above 4-5 workers actually drops off dramatically (to the point 
where with 8 workers the throughput is down to about what you get with 1-2 
workers) I don't see any conceptual reason why the ACL checks of the 
different worker threads should impact each other in any way, let alone in a 
way that limits scalability to ~4 workers before adding more workers is a net 
loss.


David Lang



On Wed, 13 Apr 2011, Marcos wrote:


Hi David,

could you run and publish your benchmark with squid 2.7 ???
i'd like to know if is there any regression between 2.7 and 3.x series.

thanks.

Marcos


- Mensagem original 
De: "da...@lang.hm" 
Para: Amos Jeffries 
Cc: squid-users@squid-cache.org; squid-...@squid-cache.org
Enviadas: S?bado, 9 de Abril de 2011 12:56:12
Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues

On Sat, 9 Apr 2011, Amos Jeffries wrote:


On 09/04/11 14:27, da...@lang.hm wrote:

A couple more things about the ACLs used in my test

all of them are allow ACLs (no deny rules to worry about precidence of)
except for a deny-all at the bottom

the ACL line that permits the test source to the test destination has
zero overlap with the rest of the rules

every rule has an IP based restriction (even the ones with url_regex are
source -> URL regex)

I moved the ACL that allows my test from the bottom of the ruleset to
the top and the resulting performance numbers were up as if the other
ACLs didn't exist. As such it is very clear that 3.2 is evaluating every
rule.

I changed one of the url_regex rules to just match one line rather than
a file containing 307 lines to see if that made a difference, and it
made no significant difference. So this indicates to me that it's not
having to fully evaluate every rule (it's able to skip doing the regex
if the IP match doesn't work)

I then changed all the acl lines that used hostnames to have IP
addresses in them, and this also made no significant difference

I then changed all subnet matches to single IP address (just nuked /##
throughout the config file) and this also made no significant 
difference.




Squid has always worked this way. It will *test* every rule from the top 
down to the one that matches. Also testing each line left-to-right until 
one fails or the whole line matches.




so why are the address matches so expensive



3.0 and older IP address is a 32-bit comparison.
3.1 and newer IP address is a 128-bit comparison with memcmp().

If something like a word-wise comparison can be implemented faster than 
memcmp() we would welcome it.


I wonder if there should be a different version that's used when IPv6 is 
disabled. this is a pretty large hit.


if the data is aligned properly, on a 64 bit system this should still only 
be 2 compares. do you do any alignment on the data now?



and as noted in the e-mail below, why do these checks not scale nicely
with the number of worker processes? If 

Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-14 Thread david

Ok, I finally got a chance to test 2.7STABLE9

it performs about the same as squid 3.0, possibly a little better.

with my somewhat stripped down config (smaller regex patterns, replacing 
CIDR blocks and names that would need to be looked up in /etc/hosts with 
individual IP addresses)


2.7 gives ~4800 requests/sec
3.0 gives ~4600 requests/sec
3.2.0.6 with 1 worker gives ~1300 requests/sec
3.2.0.6 with 5 workers gives ~2800 requests/sec

the numbers for 3.0 are slightly better than what I was getting with the 
full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I 
got from the last round of tests (with either the full or simplified 
ruleset)


so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the 
ability to use multiple worker processes in 3.2 doesn't make up for this.


the time taken seems to almost all be in the ACL avaluation as eliminating 
all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec.


one theory is that even though I have IPv6 disabled on this build, the 
added space and more expensive checks needed to compare IPv6 addresses 
instead of IPv4 addresses accounts for the single worker drop of ~66%. 
that seems rather expensive, even though there are 293 http_access lines 
(and one of them uses external file contents in it's acls, so it's a total 
of ~2400 source/destination pairs, however due to the ability to shortcut 
the comparison the number of tests that need to be done should be <400)




In addition, there seems to be some sort of locking betwen the multiple 
worker processes in 3.2 when checking the ACLs as the test with almost no 
ACLs scales close to 100% per worker while with the ACLs it scales much 
more slowly, and above 4-5 workers actually drops off dramatically (to the 
point where with 8 workers the throughput is down to about what you get 
with 1-2 workers) I don't see any conceptual reason why the ACL checks of 
the different worker threads should impact each other in any way, let 
alone in a way that limits scalability to ~4 workers before adding more 
workers is a net loss.


David Lang



On Wed, 13 Apr 2011, Marcos wrote:


Hi David,

could you run and publish your benchmark with squid 2.7 ???
i'd like to know if is there any regression between 2.7 and 3.x series.

thanks.

Marcos


- Mensagem original 
De: "da...@lang.hm" 
Para: Amos Jeffries 
Cc: squid-users@squid-cache.org; squid-...@squid-cache.org
Enviadas: S?bado, 9 de Abril de 2011 12:56:12
Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues

On Sat, 9 Apr 2011, Amos Jeffries wrote:


On 09/04/11 14:27, da...@lang.hm wrote:

A couple more things about the ACLs used in my test

all of them are allow ACLs (no deny rules to worry about precidence of)
except for a deny-all at the bottom

the ACL line that permits the test source to the test destination has
zero overlap with the rest of the rules

every rule has an IP based restriction (even the ones with url_regex are
source -> URL regex)

I moved the ACL that allows my test from the bottom of the ruleset to
the top and the resulting performance numbers were up as if the other
ACLs didn't exist. As such it is very clear that 3.2 is evaluating every
rule.

I changed one of the url_regex rules to just match one line rather than
a file containing 307 lines to see if that made a difference, and it
made no significant difference. So this indicates to me that it's not
having to fully evaluate every rule (it's able to skip doing the regex
if the IP match doesn't work)

I then changed all the acl lines that used hostnames to have IP
addresses in them, and this also made no significant difference

I then changed all subnet matches to single IP address (just nuked /##
throughout the config file) and this also made no significant difference.



Squid has always worked this way. It will *test* every rule from the top 
down to the one that matches. Also testing each line left-to-right until 
one fails or the whole line matches.




so why are the address matches so expensive



3.0 and older IP address is a 32-bit comparison.
3.1 and newer IP address is a 128-bit comparison with memcmp().

If something like a word-wise comparison can be implemented faster than 
memcmp() we would welcome it.


I wonder if there should be a different version that's used when IPv6 is 
disabled. this is a pretty large hit.


if the data is aligned properly, on a 64 bit system this should still only 
be 2 compares. do you do any alignment on the data now?



and as noted in the e-mail below, why do these checks not scale nicely
with the number of worker processes? If they did, the fact that one 3.2
process is about 1/3 the speed of a 3.0 process in checking the acls
wouldn't matter nearly as much when it's so easy to get an 8+ core 
system.




There you have the unknown.


I think this is a fairly critical thing to figure out.


Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

2011-04-13 Thread david
sorry, haven't had time to do that yet. I will try and get this done 
today.


David Lang

On Wed, 13 Apr 2011, Marcos wrote:


Date: Wed, 13 Apr 2011 04:11:09 -0700 (PDT)
From: Marcos 
To: da...@lang.hm, Amos Jeffries 
Cc: squid-users@squid-cache.org, squid-...@squid-cache.org
Subject: Res: [squid-users] squid 3.2.0.5 smp scaling issues

Hi David,

could you run and publish your benchmark with squid 2.7 ???
i'd like to know if is there any regression between 2.7 and 3.x series.

thanks.

Marcos


- Mensagem original 
De: "da...@lang.hm" 
Para: Amos Jeffries 
Cc: squid-users@squid-cache.org; squid-...@squid-cache.org
Enviadas: S?bado, 9 de Abril de 2011 12:56:12
Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues

On Sat, 9 Apr 2011, Amos Jeffries wrote:


On 09/04/11 14:27, da...@lang.hm wrote:

A couple more things about the ACLs used in my test

all of them are allow ACLs (no deny rules to worry about precidence of)
except for a deny-all at the bottom

the ACL line that permits the test source to the test destination has
zero overlap with the rest of the rules

every rule has an IP based restriction (even the ones with url_regex are
source -> URL regex)

I moved the ACL that allows my test from the bottom of the ruleset to
the top and the resulting performance numbers were up as if the other
ACLs didn't exist. As such it is very clear that 3.2 is evaluating every
rule.

I changed one of the url_regex rules to just match one line rather than
a file containing 307 lines to see if that made a difference, and it
made no significant difference. So this indicates to me that it's not
having to fully evaluate every rule (it's able to skip doing the regex
if the IP match doesn't work)

I then changed all the acl lines that used hostnames to have IP
addresses in them, and this also made no significant difference

I then changed all subnet matches to single IP address (just nuked /##
throughout the config file) and this also made no significant difference.



Squid has always worked this way. It will *test* every rule from the top down 
to the one that matches. Also testing each line left-to-right until one fails or 
the whole line matches.




so why are the address matches so expensive



3.0 and older IP address is a 32-bit comparison.
3.1 and newer IP address is a 128-bit comparison with memcmp().

If something like a word-wise comparison can be implemented faster than 
memcmp() we would welcome it.


I wonder if there should be a different version that's used when IPv6 is 
disabled. this is a pretty large hit.


if the data is aligned properly, on a 64 bit system this should still only be 2 
compares. do you do any alignment on the data now?



and as noted in the e-mail below, why do these checks not scale nicely
with the number of worker processes? If they did, the fact that one 3.2
process is about 1/3 the speed of a 3.0 process in checking the acls
wouldn't matter nearly as much when it's so easy to get an 8+ core system.



There you have the unknown.


I think this is a fairly critical thing to figure out.



it seems to me that all accept/deny rules in a set should be able to be
combined into a tree to make searching them very fast.

so for example if you have

accept 1
accept 2
deny 3
deny 4
accept 5

you need to create three trees (one with accept 1 and accept 2, one with
deny3 and deny4, and one with accept 5) and then check each tree to see
if you have a match.

the types of match could be done in order of increasing cost, so if you


The config file is specific structure configured by admin under guaranteed 
rules of operation for access lines (top-down, left-to-right, first-match-wins) 
to perform boolean-logic calculations using ACL sets.

Sorting access line rules is not an option.
Sorting ACL values and tree-forming them is already done (regex being the one 
exception AFAIK).
Sorting position-wise on a single access line is also ruled out by interactions 
with deny_info, auth and external ACL types.


It would seem that as long as you don't cross boundries between the different 
types, you should be able to optimize within a group.


using my example above, you couldn't combine the 'accept 5' with any of the 
other accepts, but you could combine accept 1 and 2 and combine deny 3 and 4 
togeather.


now, I know that I don't fully understand all the possible ACL types, so this 
may not work for some of them, but I believe that a fairly common use case is to 
have either a lot of allow rules, or a lot of deny rules as a block (either a 
list of sites you are allowed to access, or a list of sites that are blocked), 
so an ability to optimize these use cases may be well worth it.



have acl entries of type port, src, dst, and url regex, organize the
tree so that you check ports first, then src, then dst, then only if all
that matches do you need to do the regex. This would be very similar to
the shortcut logic that you use today with a single rule where you bail
out when you don't