Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-08 Thread Robert Watson
On Mon, 7 Jun 2004, Ali Niknam wrote:

> > There isn't a timeout.  Rather, the lock spins so long as the current
> > owning thread is executing on another CPU.
> 
> Interesting. Is there a way to 'lock' CPU's so that they always run on
> 'another' CPU ?
> 
> Unfortunately as we speak the server is down again :( This all makes me
> wonder wether I should simply go back to 4.10.
> I decreased the maximum number of apache children to 1400 and the server
> seems to be barely holding on:
> last pid:  2483;  load averages: 75.77, 28.63, 11.40up 0+00:04:32
> 19:35:07
> 1438 processes:2 running, 294 sleeping, 1142 lock
> CPU states:  6.2% user,  0.0% nice, 62.6% system,  7.5% interrupt, 23.8%
> idle
> Mem: 698M Active, 27M Inact, 209M Wired, 440K Cache, 96M Buf, 1068M Free
> Swap: 512M Total, 512M Free
> 
> Are there anymore quite stable things to do ? That is except for upping
> to current, which I frankly feel is too dangerous... 

Is there any way you can give us a "top -S" output snapshot of your full
set of processes, if necessary omitting sensitive process names, etc?

Also, can you give a snapshot of "vmstat -systat" once it's settled for a
few iterations?

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-08 Thread Robert Watson

On Mon, 7 Jun 2004, Ali Niknam wrote:

> > There isn't a timeout.  Rather, the lock spins so long as the current
> > owning thread is executing on another CPU.
> 
> Interesting. Is there a way to 'lock' CPU's so that they always run on
> 'another' CPU ?
> 
> Unfortunately as we speak the server is down again :( This all makes me
> wonder wether I should simply go back to 4.10.

No one would blame you for backing off -CURRENT to -STABLE.  On the other
hand, having high workloads against -CURRENT is going to be critical to
identifying weaknesses in -CURRENT so we can improve them.  Unfortunately,
it's something of a chicken-and-egg problem...

> I decreased the maximum number of apache children to 1400 and the server
> seems to be barely holding on:
> last pid:  2483;  load averages: 75.77, 28.63, 11.40up 0+00:04:32
> 19:35:07
> 1438 processes:2 running, 294 sleeping, 1142 lock
> CPU states:  6.2% user,  0.0% nice, 62.6% system,  7.5% interrupt, 23.8%
> idle
> Mem: 698M Active, 27M Inact, 209M Wired, 440K Cache, 96M Buf, 1068M Free
> Swap: 512M Total, 512M Free
> 
> Are there anymore quite stable things to do ? That is except for upping
> to current, which I frankly feel is too dangerous...

There are a number of known weaknesses in 5.2.1 that are resolved in
-CURRENT, but the update would also involve substantial risk as there's
some heavy moving going on in -CURRENT to improve network performance,
etc.  I haven't followed some of your system description in details, but
it seems like the primary thing to do right now, assuming you are still
able to keep 5.2.1 running on the box and are able to futz with the
configuration some, is to identify the specific source of the problem
you're experiencing.  Clearly, too much work is going on in the kernel. 
The question is, what work.  It's likely you're running into an expensive
edge case, it's possible it's resolved in HEAD, and it could be that a low
risk back port would resolve it.  It's also possible you're running into
an unresolved problem in HEAD.

The best case scenario from my perspective would be that you could provide
an equivilent workload against a test box where we could experiment with a
number of debugging settings, as well as simply trying -CURRENT...  It
sounds like we've tried some of the easy plugs, such as switching
schedulers, enabling adaptive mutexes, etc.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-08 Thread Ali Niknam
> There isn't a timeout.  Rather, the lock spins so long as the current
> owning thread is executing on another CPU.

Interesting. Is there a way to 'lock' CPU's so that they always run on
'another' CPU ?

Unfortunately as we speak the server is down again :( This all makes me
wonder wether I should simply go back to 4.10.
I decreased the maximum number of apache children to 1400 and the server
seems to be barely holding on:
last pid:  2483;  load averages: 75.77, 28.63, 11.40up 0+00:04:32
19:35:07
1438 processes:2 running, 294 sleeping, 1142 lock
CPU states:  6.2% user,  0.0% nice, 62.6% system,  7.5% interrupt, 23.8%
idle
Mem: 698M Active, 27M Inact, 209M Wired, 440K Cache, 96M Buf, 1068M Free
Swap: 512M Total, 512M Free


Are there anymore quite stable things to do ? That is except for upping to
current, which I frankly feel is too dangerous...

-- 
 Ali Niknam <[EMAIL PROTECTED]> | tel 0182-504424 | fax 0182-504460
 Transip B.V. | http://www.transip.nl/ | Mensen met verstand van zaken.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-07 Thread Brian Feldman
On Sat, Jun 05, 2004 at 10:55:31PM +0200, Ali Niknam wrote:
> I tried this; this helps performance a lot, here are the findings:
>  - under all conditions turning on HTT helps a *lot* (which is logical i
> think)
>  - under non killing load (killing load = load where server would have
> crashed without this option) it performs much much better
>  - under killing load it performs a lot better, up until a certain level:
>  - a new killing level: from this point onward basically the same thing
> happens as before..

Something is happening which should not be at a much more fundamental
level.  Something is going on to cause everything to block in Giant.
That could be some exceptionally-long operation that executes, holding
Giant, without andy context switches.  In general, this is really what
you would call a "deadlock," but at least you can recover from it.  If
the system is totally unresponsive to your input, is it still working
from the standpoint of the users on it?  Are there strange syslog
messages?  Can you watch the history of sysctl vm.vmtotal, sysctl vm.zone,
and vmstat -m to see if it's a memory starvation issue?

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
  <> [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-07 Thread John Baldwin
On Monday 07 June 2004 01:35 pm, Ali Niknam wrote:
> > There isn't a timeout.  Rather, the lock spins so long as the current
> > owning thread is executing on another CPU.
>
> Interesting. Is there a way to 'lock' CPU's so that they always run on
> 'another' CPU ?

Not in userland, no.

> Unfortunately as we speak the server is down again :( This all makes me
> wonder wether I should simply go back to 4.10.
> I decreased the maximum number of apache children to 1400 and the server
> seems to be barely holding on:
> last pid:  2483;  load averages: 75.77, 28.63, 11.40up 0+00:04:32
> 19:35:07
> 1438 processes:2 running, 294 sleeping, 1142 lock
> CPU states:  6.2% user,  0.0% nice, 62.6% system,  7.5% interrupt, 23.8%
> idle
> Mem: 698M Active, 27M Inact, 209M Wired, 440K Cache, 96M Buf, 1068M Free
> Swap: 512M Total, 512M Free
>
>
> Are there anymore quite stable things to do ? That is except for upping to
> current, which I frankly feel is too dangerous...

Nothing that I can think of off the top of my head.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-07 Thread John Baldwin
On Saturday 05 June 2004 04:55 pm, Ali Niknam wrote:
> Hi Robert,
>
> As promised my findings regarding the changes; just came home after a night
> of trying and praying :)
>
> > Actually, by default, most mutexes in the system are sleep mutexes, so
> > they sleep on contention rather than spinning.  In some cases, this
> > actually hurts more than spinning, because if the mutex is released
> > quickly by the holder, then you pay the context switches which cost
> > more than spinning for the short period of time.
> >
> > You might want to try adding "options ADAPTIVE_MUTEXES" to your kernel
> > configuration, which will cause mutexes to spin briefly on SMP systems
> > before sleeping, and has been observed to improve performance quite a
> > bit.
>
> I tried this; this helps performance a lot, here are the findings:
>  - under all conditions turning on HTT helps a *lot* (which is logical i
> think)
>  - under non killing load (killing load = load where server would have
> crashed without this option) it performs much much better
>  - under killing load it performs a lot better, up until a certain level:
>  - a new killing level: from this point onward basically the same thing
> happens as before..
>
> What i'm guessing is that probably this new killing level occurs when load
> is so high that the spins 'adapt' into blocks. From your description above
> I understand that there's a certain timeout when 'spinning' mutexes turn
> into 'blocking'/'sleeping' mutexes. Is there a way to set this timeout ? I
> would very much like to try out what would happen if one would set this
> timeout to a quite high value.

There isn't a timeout.  Rather, the lock spins so long as the current owning 
thread is executing on another CPU.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-06 Thread Ali Niknam
Hi Robert,

As promised my findings regarding the changes; just came home after a night
of trying and praying :)

> Actually, by default, most mutexes in the system are sleep mutexes, so
> they sleep on contention rather than spinning.  In some cases, this
> actually hurts more than spinning, because if the mutex is released
> quickly by the holder, then you pay the context switches which cost
> more than spinning for the short period of time.
>
> You might want to try adding "options ADAPTIVE_MUTEXES" to your kernel
> configuration, which will cause mutexes to spin briefly on SMP systems
> before sleeping, and has been observed to improve performance quite a
> bit.
>

I tried this; this helps performance a lot, here are the findings:
 - under all conditions turning on HTT helps a *lot* (which is logical i
think)
 - under non killing load (killing load = load where server would have
crashed without this option) it performs much much better
 - under killing load it performs a lot better, up until a certain level:
 - a new killing level: from this point onward basically the same thing
happens as before..

What i'm guessing is that probably this new killing level occurs when load
is so high that the spins 'adapt' into blocks. From your description above I
understand that there's a certain timeout when 'spinning' mutexes turn into
'blocking'/'sleeping' mutexes. Is there a way to set this timeout ? I would
very much like to try out what would happen if one would set this timeout to
a quite high value.

Appart from this i also tried options ZERO_COPY_SOCKET, but that didnt seem
to help much, if at all.

Furthermore I tried out SCHED_ULE which was dramatic! I'm not sure if i'm
the only one, but the performance was really terrible. i switched it off
again as soon as i could.

Also what I was wondering: do processes that go into sleep-mutex mode go
into the same waiting queue as normal processes, or do they go into a
special queue?
Could this problem basically boil down to a scheduler being to slow (or the
context switching) for these amounts of processes waiting/blocking ? If so
could it be an idea to put blocking processes into a special queue in which
the scheduler adepts simple scheduling algorithm (such as first come first
serve, or random, or whatever) to dramatically reduce rescheduling time ?

Best Regards,
Ali Niknam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-05 Thread Ali Niknam
> Welcome :-).
>

Thank you :)

> Actually, by default, most mutexes in the system are sleep mutexes, so
> they sleep on contention rather than spinning.  In some cases, this
> actually hurts more than spinning, because if the mutex is released
> quickly by the holder, then you pay the context switches which cost
> more than spinning for the short period of time.
>
> You might want to try adding "options ADAPTIVE_MUTEXES" to your kernel
> configuration, which will cause mutexes to spin briefly on SMP systems
> before sleeping, and has been observed to improve performance quite a
> bit.
>

Interesting; could switching to SCHED_ULE help as well ? Since afaik the
processes get re-scheduled?
Also could this be the reason that system gets to use so much cpu (like 70%
of overal cpu)? That it needs to reschedule ~1000 processes continuesly ?

> If you have a lower tolerance for instability, there are a number of
> minor performance tweaks that can be easily back-ported to 5.2.1,
> such as the change to proc.h to make grabbing and releasing the proc
> lock conditional on p_stops having events defined.  This removes
> several mutex operations from each system call, and I've observed the
> difference in a pretty measurable way on micro-benchmarks.  It's also
> pretty low risk.  The change is src/sys/sys/proc.h:1.366.  There are
> some other related changes that can probably be dug up, including
> changes to improve the performance of the scheduler in the presence
> of threads, etc.

if all else fails i'll start doing this, thanks for the suggestion!

-- 
 Ali Niknam <[EMAIL PROTECTED]> | tel 0182-504424 | fax 0182-504460
 Transip B.V. | http://www.transip.nl/ | Mensen met verstand van zaken.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-05 Thread Ali Niknam
Hi Alexander,

> I can't say anything as how the issue can be connected with the
> mutexes and so on, but to solve your problem with apache, I'd look
> into 'hold_off_on_exponential_spawning' and 'MAX_SPAWN_RATE'
> parameters in src/main/http_main.c of the apache source tree
> (presuming you're using apache 1.3.*), and I'm sure some similar
> options can be found for apache
> 2.0. What you need is to make apache forking rate more slower, so the
> server will not suffer from a sudden load peak.

That was my first thought exactly! I halved the MAX_SPAWN_RATE to 16 (from
32) and then *exactly* the same thing happened; it only took a minute longer
to happen 

If i recall correctly (not sure anymore since it was middle of the night)
other processes also got blocked (hence i couldnt use keyboard anymore).

That was why i figured it was some kind of lock/block/mutex/whatever inside
the kernel

Best Regards,
Ali Niknam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-04 Thread Steven Hartland

- Original Message - 
From: "Robert Watson" <[EMAIL PROTECTED]>

> Well, I know a bit about the general conditions -- situations where
> mutexes are held for short periods of time, rather than over long
> transactions.  I know from experience that adaptive mutexes can make an
> observable difference for system builds and IPC activities. 
> 
> To what extent do you have systems where you can reproduce your production
> load without impacting production quality?  I may have some interesting
> patches for you to try running with, if so :-).

Got two online Dev boxes I can play with, remote console backup
etc so would defo be willing to give them a shot and we run full
performance stats on all the machines so that would help determine
any gain. As you would expect a game server is constantly doing
net IO small amounts of disk IO and quite a chunk of logic
processing. The other thing of note is virtually all gameservers
run via the linux BC layer as game dev's dont generally have
a native port for FreeBSD.

I'm also currently benchmarking a dual opteron across a number
of OS's any changes you have could also be tested on that bed
vs 5.2.1-RELEASE along with the rest.

Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or 
entity to whom it is addressed. In the event of misdirection, the recipient is 
prohibited from using, copying, printing or otherwise disseminating it or any 
information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone 
(023) 8024 3137
or return the E.mail to [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-04 Thread Robert Watson

On Fri, 4 Jun 2004, Steven Hartland wrote:

> > You might want to try adding "options ADAPTIVE_MUTEXES" to your kernel
> > configuration, which will cause mutexes to spin briefly on SMP systems
> > before sleeping, and has been observed to improve performance quite a bit.
> 
> Thats very interesting is there a specific set off conditions where it
> would pay off that you guys know of or is it more a try it and see? 
> 
> We run ~ 100 dual machines here the vast majority FreeBSD as game
> servers and would consider upgrading the kernels if you thought it would
> help. 

Well, I know a bit about the general conditions -- situations where
mutexes are held for short periods of time, rather than over long
transactions.  I know from experience that adaptive mutexes can make an
observable difference for system builds and IPC activities. 

To what extent do you have systems where you can reproduce your production
load without impacting production quality?  I may have some interesting
patches for you to try running with, if so :-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-04 Thread Steven Hartland

- Original Message - 
From: "Robert Watson" <[EMAIL PROTECTED]>

> You might want to try adding "options ADAPTIVE_MUTEXES" to your kernel
> configuration, which will cause mutexes to spin briefly on SMP systems
> before sleeping, and has been observed to improve performance quite a bit.

Thats very interesting is there a specific set off conditions
where it would pay off that you guys know of or is it more
a try it and see?

We run ~ 100 dual machines here the vast majority FreeBSD
as game servers and would consider upgrading the kernels
if you thought it would help.

Steve




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or 
entity to whom it is addressed. In the event of misdirection, the recipient is 
prohibited from using, copying, printing or otherwise disseminating it or any 
information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone 
(023) 8024 3137
or return the E.mail to [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-04 Thread Robert Watson

On Thu, 3 Jun 2004, Ali Niknam wrote:

> First of all: this is my first posting in this group so please be gentil
> :) 

Welcome :-).


> Now i unfortunately do not know enough about the internals of BSD to do a
> very estimated guess, but i'll give a shot nevertheless: my estimate is that
> due to the tremendous amount of 'locked' processes the system simply starves
> of CPU to do anything. My guess is the Locking mechanism probably uses
> some kind of 'spin' to wait until the resource is unlocked (whichever
> resource it is, probably something network related, though).


Actually, by default, most mutexes in the system are sleep mutexes, so
they sleep on contention rather than spinning.  In some cases, this
actually hurts more than spinning, because if the mutex is released
quickly by the holder, then you pay the context switches which cost more
than spinning for the short period of time.

You might want to try adding "options ADAPTIVE_MUTEXES" to your kernel
configuration, which will cause mutexes to spin briefly on SMP systems
before sleeping, and has been observed to improve performance quite a bit.

> I would be very interested to hear what this problem could be; perhaps i
> can test a little if someone has solutions (i cant test much
> unfortunately, it's a production system).

As you may or may not be aware, improving locking and parallelism in
FreeBSD 5.x is a big on-going task, with a lot of activity.  A moderate
quantity of recent locking work has occurred since 5.2.1 release, so
depending on your tolerance for experimentation on this system, you might
wish to give 5-CURRENT a try.  Be warned that 5-CURRENT, while having a
number of performance enhancements, also has some stability regressions,
more recent ACPI code, etc.  I'm using older snapshots of 5-CURRENT in
production today, but generally not newer than about April or early May.
If you do try -CURRENT, take a look at UPDATING, and make sure to disable
a lot of the debugging features present if you're interested specifically
in performance.

If you have a lower tolerance for instability, there are a number of minor
performance tweaks that can be easily back-ported to 5.2.1, such as the
change to proc.h to make grabbing and releasing the proc lock conditional
on p_stops having events defined.  This removes several mutex operations
from each system call, and I've observed the difference in a pretty
measurable way on micro-benchmarks.  It's also pretty low risk.  The
change is src/sys/sys/proc.h:1.366.  There are some other related changes
that can probably be dug up, including changes to improve the performance
of the scheduler in the presence of threads, etc. 

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Senior Research Scientist, McAfee Research


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-04 Thread Varshavchick Alexander
Hi Ali,

I can't say anything as how the issue can be connected with the mutexes
and so on, but to solve your problem with apache, I'd look into
'hold_off_on_exponential_spawning' and 'MAX_SPAWN_RATE' parameters in
src/main/http_main.c of the apache source tree (presuming you're using
apache 1.3.*), and I'm sure some similar options can be found for apache
2.0. What you need is to make apache forking rate more slower, so the
server will not suffer from a sudden load peak.

Just my $0.02 :)


Alexander Varshavchick, Metrocom Joint Stock Company
Phone: (812)118-3322, 118-3115(fax)

On Thu, 3 Jun 2004, Ali Niknam wrote:

> Hi Guys,
>
> First of all: this is my first posting in this group so please be gentil :)
>
> The other day I was upgrading a system from FreeBSD 4.5 single CPU to
> FreeBSD 5.2.1 dual CPU and I came across a terrible problem.
>
> The system is used as a rather busy webserver, with continuesly about 1200
> apache processes, and about 200 mysql pthreads.
>
> The problem i ran into is that when apache starts it needs to create a lot
> of childs quickly. When it does so at a given time, after about a minute or
> so, a couple of childs go into "Giant" status mode. After a few seconds more
> and more processes go into Giant mode up until the point that the system
> will become totally unresponsive (even for keyboard innput). The only remedy
> is to disconnect the utp and wait a few seconds; then kill everything.
>
> Now the nice part is: this happens only if i set apache's maxclients > 1250.
> Under 1250 the same scenario happens but after a minute or so the system
> recovers!
>
> Now i unfortunately do not know enough about the internals of BSD to do a
> very estimated guess, but i'll give a shot nevertheless: my estimate is that
> due to the tremendous amount of 'locked' processes the system simply starves
> of CPU to do anything. My guess is the Locking mechanism probably uses
> some kind of 'spin' to wait until the resource is unlocked (whichever
> resource it is, probably something network related, though).
>
> This is based upon the fact that this does not happen if you slightly
> decrease the number of apache's; what happens in that case is that the same
> scenario goes on; however after a minute or so the system recovers!
> (probably because it has just enough CPU to handle everything as apache
> hits its limit?)
>
> Now if this is indeed the case i was thinking of something like a sysctl
> MUTEX_BLOCK_THRESHOLD set to something like 50. If the system detects that
> the number of processes locked is higher than this number, then it stops
> 'spinning' for resources, but instead uses a 'blocking' mechanism (simply
> puts the processes in a 'waiting' queue).
>
> I would be very interested to hear what this problem could be; perhaps i can
> test a little if someone has solutions (i cant test much unfortunately,
> it's a production system).
>
> Best Regards,
> Ali Niknam
>
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
>
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


FreeBSD 5.2.1: Mutex/Spinlock starvation?

2004-06-04 Thread Ali Niknam
Hi Guys,

First of all: this is my first posting in this group so please be gentil :)

The other day I was upgrading a system from FreeBSD 4.5 single CPU to
FreeBSD 5.2.1 dual CPU and I came across a terrible problem.

The system is used as a rather busy webserver, with continuesly about 1200
apache processes, and about 200 mysql pthreads.

The problem i ran into is that when apache starts it needs to create a lot
of childs quickly. When it does so at a given time, after about a minute or
so, a couple of childs go into "Giant" status mode. After a few seconds more
and more processes go into Giant mode up until the point that the system
will become totally unresponsive (even for keyboard innput). The only remedy
is to disconnect the utp and wait a few seconds; then kill everything.

Now the nice part is: this happens only if i set apache's maxclients > 1250.
Under 1250 the same scenario happens but after a minute or so the system
recovers!

Now i unfortunately do not know enough about the internals of BSD to do a
very estimated guess, but i'll give a shot nevertheless: my estimate is that
due to the tremendous amount of 'locked' processes the system simply starves
of CPU to do anything. My guess is the Locking mechanism probably uses
some kind of 'spin' to wait until the resource is unlocked (whichever
resource it is, probably something network related, though).

This is based upon the fact that this does not happen if you slightly
decrease the number of apache's; what happens in that case is that the same
scenario goes on; however after a minute or so the system recovers!
(probably because it has just enough CPU to handle everything as apache
hits its limit?)

Now if this is indeed the case i was thinking of something like a sysctl
MUTEX_BLOCK_THRESHOLD set to something like 50. If the system detects that
the number of processes locked is higher than this number, then it stops
'spinning' for resources, but instead uses a 'blocking' mechanism (simply
puts the processes in a 'waiting' queue).

I would be very interested to hear what this problem could be; perhaps i can
test a little if someone has solutions (i cant test much unfortunately,
it's a production system).

Best Regards,
Ali Niknam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"