Re: Out of swap handling and X lockups in 3.2R

1999-09-26 Thread Matthew Dillon


:
:Matthew Dillon wrote:
: 
: What it all comes down to is a juxtaposition of what people believe
: is appropriate verses what people are actually willing to code up.
: I'm willing to code up my importance mechanism idea.  The question is
: whether it's a good enough idea to throw into the tree.
:
:I think it's a good idea. It lets the admin introduce bias in the
:system to protect people/processes who are more likely to use huge
:amount of memory. Alas, taking the swap space into account in
:addition to RSS seems more important to me. But then, I'm happy with
:the way things are right now.
:
:--
:Daniel C. Sobral   (8-DCS)

I'm going to implement and commit this idea into -CURRENT unless someone
screams.  I think it would be an excellent base on top of which future
sohpistication can be added.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-23 Thread Matthew Dillon

:Matthew Dillon wrote:
: 
: How about this - add an 'importance' resource.  The lower the number,
: the more likely the process will be killed if the system runs out of
: resources.  We would also make fork automatically decrement the number
: by one in the child.
:
:Well, that's one thing people have asked for. It can be useful, and
:doesn't sound particularly hard to code, nor too intrusive or
:resource-hog. Would make some people, on both camps.
:
:Alas, some people will never let go until we have a no overcommit
:switch, and *then* they'll start asking for us to go to the lengths
:Solaris does to reduce the disadvantages.
:
:--
:Daniel C. Sobral   (8-DCS)

My feeling is that adding an importance mechanism would not preclude
adding more sophisticated mechanisms on top of it.

The real question is:  Are enough people interested in me doing the
*basic* (word emphasized) importance mechanism for me to do it?  

What it all comes down to is a juxtaposition of what people believe
is appropriate verses what people are actually willing to code up.
I'm willing to code up my importance mechanism idea.  The question is
whether it's a good enough idea to throw into the tree.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-23 Thread Daniel C. Sobral

Matthew Dillon wrote:
 
 What it all comes down to is a juxtaposition of what people believe
 is appropriate verses what people are actually willing to code up.
 I'm willing to code up my importance mechanism idea.  The question is
 whether it's a good enough idea to throw into the tree.

I think it's a good idea. It lets the admin introduce bias in the
system to protect people/processes who are more likely to use huge
amount of memory. Alas, taking the swap space into account in
addition to RSS seems more important to me. But then, I'm happy with
the way things are right now.

--
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Thus, over the years my wife and I have physically diverged. While
I have zoomed toward a crusty middle-age, she has instead clung
doggedly to the sweet bloom of youth. Naturally I think this unfair.
Yet, if it was the other way around, I confess I wouldn't be happy
either."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Ivan


 :where SIZE was 4 MB in this case. I ran it on the console (I've got 64 MB
 :of RAM and 128 MB of swap) until the swap pager went out of space and
 :my huge process was eventually killed as expected. Fine. But when I ran 
 :it under X Window, the system eventually killed the X server (SIZE ~20 MB,
 :RES ~14 MB -- the biggest RES size) instead of my big process (SIZE ~100
 :MB, RES 0K). 
 :
 :My question is: Why was the X server killed ? Was it because the 'biggest'
 :process is the one with the biggest resident memory size ?
 :And if so, why not take into account the total size of processes ?
 
 The algorithm is pretty dumb.  In fact, it would not be too difficult
 to actually calculate the amount of swap being used by a process and
 add that to the RSS when figuring out who to kill.

Thank you for your explanations ! 
I had a look at vm_pageout.c and noticed that situations may occur where
no process can be killed. I guess that in such situations memory
allocation requests are simply rejected ( e.g. malloc returning NULL ) .
Is there a reason why this isn't the default behavior in FreeBSD ? i.e.
why does the system always try to kill a process ?

 
 The X server wasn't killed nicely, it couldn't take you out of the
 video mode.
 
Indeed, the 'biggest' process is SIGKILLed without any prior notice. Would
it be possible to send him a nicer signal first, to let him a chance to
quit before being killed ?

A last question, to FreeBSD developpers:
After a few tests, I came to the conclusion that it's quite easy to crash
a vanilla FreeBSD system (without any per-user/per-process limits set) by
simply running it out of swap space ... (the 'kill the biggest process'
mechanism doesn't seem to always work !?) 
Is this a currently addressed issue, or is it simply considered not an
issue ?

Thanks in advance for your time,

Ivan

   Matthew Dillon 
   [EMAIL PROTECTED]
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Daniel C. Sobral

First, let me warn you that this is a often recurring thread. It has
already showed up two or three times this year alone.

Ivan wrote:
 
 I had a look at vm_pageout.c and noticed that situations may occur where
 no process can be killed. I guess that in such situations memory
 allocation requests are simply rejected ( e.g. malloc returning NULL ) .

Err... no. Malloc() does not "call" these functions. By the time a
pageout is requested, the malloc() has already finished. The pageout
is being requested because a program is trying to use the memory
that was allocated to it.

 Is there a reason why this isn't the default behavior in FreeBSD ? i.e.
 why does the system always try to kill a process ?

If no process can be killed, the system will panic (or deadlock).

 Indeed, the 'biggest' process is SIGKILLed without any prior notice. Would
 it be possible to send him a nicer signal first, to let him a chance to
 quit before being killed ?

I'd very much like to see swap space being taking into account in
addition to RSS. A runaway program is more likely to have a low RSS
and a large swap than a large RSS.

Anyway, some Unix systems do send a signal in low memory conditions.
In AIX (the one I'm most familiar with) it is called SIGDANGER, and
it's handler defaults to SIG_IGN.

One reason why we do not do this is the lack of support for more
than 32 signals. Alas, I think we now support more than 32 signals,
don't we? If that's the case, I'd think it shouldn't be too
difficult to make the swapper send SIGDANGER to all processes when
it reaches a certain threshold (x% full? xMb left?).

 A last question, to FreeBSD developpers:
 After a few tests, I came to the conclusion that it's quite easy to crash
 a vanilla FreeBSD system (without any per-user/per-process limits set) by
 simply running it out of swap space ... (the 'kill the biggest process'
 mechanism doesn't seem to always work !?)

'kill the biggest process' should always work. Do you have any test
case where it doesn't?

 Is this a currently addressed issue, or is it simply considered not an
 issue ?

FreeBSD's memory overcommit behavior is not considered an issue by
anyone with the knowledge to do something about it. In fact, these
people consider FreeBSD behavior to be a gain over
non-overcommitting systems (such as Solaris). A lot of people share
this opinion, and some people strongly disagrees.

As for the problems that might result from it, the solution is to
use per-process limits through login.conf, and be a good
administrator.

--
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Thus, over the years my wife and I have physically diverged. While
I have zoomed toward a crusty middle-age, she has instead clung
doggedly to the sweet bloom of youth. Naturally I think this unfair.
Yet, if it was the other way around, I confess I wouldn't be happy
either."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Alfred Perlstein


On Thu, 23 Sep 1999, Daniel C. Sobral wrote:

 First, let me warn you that this is a often recurring thread. It has
 already showed up two or three times this year alone.
 
 Ivan wrote:
  
  I had a look at vm_pageout.c and noticed that situations may occur where
  no process can be killed. I guess that in such situations memory
  allocation requests are simply rejected ( e.g. malloc returning NULL ) .
 
 Err... no. Malloc() does not "call" these functions. By the time a
 pageout is requested, the malloc() has already finished. The pageout
 is being requested because a program is trying to use the memory
 that was allocated to it.
 
  Is there a reason why this isn't the default behavior in FreeBSD ? i.e.
  why does the system always try to kill a process ?
 
 If no process can be killed, the system will panic (or deadlock).

  Indeed, the 'biggest' process is SIGKILLed without any prior notice. Would
  it be possible to send him a nicer signal first, to let him a chance to
  quit before being killed ?
 
 I'd very much like to see swap space being taking into account in
 addition to RSS. A runaway program is more likely to have a low RSS
 and a large swap than a large RSS.
 
 Anyway, some Unix systems do send a signal in low memory conditions.
 In AIX (the one I'm most familiar with) it is called SIGDANGER, and
 it's handler defaults to SIG_IGN.
 
 One reason why we do not do this is the lack of support for more
 than 32 signals. Alas, I think we now support more than 32 signals,
 don't we? If that's the case, I'd think it shouldn't be too
 difficult to make the swapper send SIGDANGER to all processes when
 it reaches a certain threshold (x% full? xMb left?).

Terry Lambert brought up an interesting thought from AIX (I think),
instead of killing a process, it just sleeps the requesting process
until the situation alleviates itself.  Of course this can wind up
wedging an entire system, it would probably be advisable to then
revert to killing when more than a threshold of processes go into
a vmwait sleep.

As far as signals go, I'm pretty sure it's not committed yet, but I
really hope it goes in soon, it would be extremely helpful.

-Alfred



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Ivan


On Thu, 23 Sep 1999, Daniel C. Sobral wrote:

  I had a look at vm_pageout.c and noticed that situations may occur where
  no process can be killed. I guess that in such situations memory
  allocation requests are simply rejected ( e.g. malloc returning NULL ) .
 
 Err... no. Malloc() does not "call" these functions. By the time a
 pageout is requested, the malloc() has already finished. The pageout
 is being requested because a program is trying to use the memory
 that was allocated to it.

Of course I didn't mean that malloc() calls the pageout daemon ... I 
simply meant that if no more memory space can be regained (in particular
by killing a process) then at some point memory allocations will be
refused -- or else, when does malloc() ever returns NULL ?!

  Is there a reason why this isn't the default behavior in FreeBSD ? i.e.
  why does the system always try to kill a process ?
 
 If no process can be killed, the system will panic (or deadlock).
 
  Indeed, the 'biggest' process is SIGKILLed without any prior notice. Would
  it be possible to send him a nicer signal first, to let him a chance to
  quit before being killed ?
 
 I'd very much like to see swap space being taking into account in
 addition to RSS. A runaway program is more likely to have a low RSS
 and a large swap than a large RSS.
 
 Anyway, some Unix systems do send a signal in low memory conditions.
 In AIX (the one I'm most familiar with) it is called SIGDANGER, and
 it's handler defaults to SIG_IGN.
 
 One reason why we do not do this is the lack of support for more
 than 32 signals. Alas, I think we now support more than 32 signals,
 don't we? If that's the case, I'd think it shouldn't be too
 difficult to make the swapper send SIGDANGER to all processes when
 it reaches a certain threshold (x% full? xMb left?).

Or even simply send SIGTERM for instance before SIGKILL ... at least,
that would be understood by many processes (such as the X server).

  A last question, to FreeBSD developpers:
  After a few tests, I came to the conclusion that it's quite easy to crash
  a vanilla FreeBSD system (without any per-user/per-process limits set) by
  simply running it out of swap space ... (the 'kill the biggest process'
  mechanism doesn't seem to always work !?)
 
 'kill the biggest process' should always work. Do you have any test
 case where it doesn't?


I logged in and ran this little program this morning on a FreeBSD 3.2R box
(128 MB RAM, 300 MB swap) (try this at home :-):

#include stdio.h
#include assert.h

#define ISIZE 180*1024*1024
#define SIZE 1024*1024

main()
{
 char * a;
 a = (char *) malloc(ISIZE);
 assert(a);
 memset(a,0,ISIZE);
 printf("Initial size: %d bytes\n",ISIZE);

 while (getchar())
 {
   printf("Allocating %d bytes\n",SIZE);
   a = (char *) malloc(SIZE);
   assert(a);
   memset(a,0,SIZE);
 }
}

The machine wasn't too loaded, ( no swapping, active pages ~20% of RAM ).
I let the program ask for memory (pressed a key a certain number of
times), leaving some time though for my process to be almost totally
swapped out (thus ignored by the 'kill the biggest' routine) . After a while,
having reached a '99% swap used' state, everything was locked up (remote
connections, console, etc.), I couldn't event tell which process had been
killed or if something had actually been killed -- we had to reboot :-( 
Yet I'm not certain that this is related to a bug in the pageout daemon
...

  Is this a currently addressed issue, or is it simply considered not an
  issue ?
 
 FreeBSD's memory overcommit behavior is not considered an issue by
 anyone with the knowledge to do something about it. In fact, these
 people consider FreeBSD behavior to be a gain over
 non-overcommitting systems (such as Solaris). A lot of people share
 this opinion, and some people strongly disagrees.

A least I think that this overcommit behaviour should more documented :-)
 
 As for the problems that might result from it, the solution is to
 use per-process limits through login.conf, and be a good
 administrator.
 


 --
 Daniel C. Sobral  (8-DCS)
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
   "Thus, over the years my wife and I have physically diverged. While
 I have zoomed toward a crusty middle-age, she has instead clung
 doggedly to the sweet bloom of youth. Naturally I think this unfair.
 Yet, if it was the other way around, I confess I wouldn't be happy
 either."
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Chuck Robey

On Wed, 22 Sep 1999, Alfred Perlstein wrote:

 Terry Lambert brought up an interesting thought from AIX (I think),
 instead of killing a process, it just sleeps the requesting process
 until the situation alleviates itself.  Of course this can wind up
 wedging an entire system, it would probably be advisable to then
 revert to killing when more than a threshold of processes go into
 a vmwait sleep.

Seeing as the reason for killing is because you're out of system
resources, and you need to free up some in order to go on, and sleeping
the process isn't going to free up the resources needed, I don't see how
this'll help things.

What kind of resources are there that both cause loss of swap AND are
freed up by sleeping a process?



Chuck Robey| Interests include C programming, Electronics,
213 Lakeside Dr. Apt. T-1  | communications, and signal processing.
Greenbelt, MD 20770| I run picnic.mat.net: FreeBSD-current(i386) and
(301) 220-2114 |   jaunt.mat.net : FreeBSD-current(Alpha)




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Daniel C. Sobral

Ivan wrote:
 
 Of course I didn't mean that malloc() calls the pageout daemon ... I
 simply meant that if no more memory space can be regained (in particular
 by killing a process) then at some point memory allocations will be
 refused -- or else, when does malloc() ever returns NULL ?!

When per-process limits have been reached.

 Or even simply send SIGTERM for instance before SIGKILL ... at least,
 that would be understood by many processes (such as the X server).

When the time comes to do a SIGKILL, nothing else should be used.
There is +NO+ memory free. A SIGTERM under these circunstances can
led to a deadlock (or else require disgustingly complex code).

--
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Thus, over the years my wife and I have physically diverged. While
I have zoomed toward a crusty middle-age, she has instead clung
doggedly to the sweet bloom of youth. Naturally I think this unfair.
Yet, if it was the other way around, I confess I wouldn't be happy
either."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Alfred Perlstein


On Wed, 22 Sep 1999, Chuck Robey wrote:

 On Wed, 22 Sep 1999, Alfred Perlstein wrote:
 
  Terry Lambert brought up an interesting thought from AIX (I think),
  instead of killing a process, it just sleeps the requesting process
  until the situation alleviates itself.  Of course this can wind up
  wedging an entire system, it would probably be advisable to then
  revert to killing when more than a threshold of processes go into
  a vmwait sleep.
 
 Seeing as the reason for killing is because you're out of system
 resources, and you need to free up some in order to go on, and sleeping
 the process isn't going to free up the resources needed, I don't see how
 this'll help things.
 
 What kind of resources are there that both cause loss of swap AND are
 freed up by sleeping a process?

four things i can think of:

1) Along with 'SIGDANGER' it allows the system to fix itself.
2) Allow the operator to determine which program to kill, maybe the
   'hog' is actually something that needs to run to completion and
   by shutting down other systems it would survive.
3) other processes may exit, this would free the memory needed to
   continue.
4) the operator could enable swap on an additional device giving
   more backing for things to continue.

don't forget the clause about killing after putting a threshold of
active processes to sleep.

-Alfred



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Nate Williams

  What kind of resources are there that both cause loss of swap AND are
  freed up by sleeping a process?
 
 four things i can think of:
 
 1) Along with 'SIGDANGER' it allows the system to fix itself.

That's another issue.  Don't mix sleeping processes with SIGDANGER, they
are independant of one another.
There's no mention of SIGDANGER, just of pu

 2) Allow the operator to determine which program to kill, maybe the
'hog' is actually something that needs to run to completion and
by shutting down other systems it would survive.

The operator can't kill anything, since the system would be unusable at
that point, being out of resources and all.  His shell wouldn't even
run.

 3) other processes may exit, this would free the memory needed to
continue.

Maybe, and then again, maybe not.  A program is requesting memory, so
putting other processes to sleep *keeps* them from freeing up memory.

 4) the operator could enable swap on an additional device giving
more backing for things to continue.

See above.  There are no resources available for anything to run, so the
system must do *SOMETHING*.

(Yes, this is a problem with memory-overcommit, but them's the
breaks. :)


Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Matthew Dillon

How about this - add an 'importance' resource.  The lower the number,
the more likely the process will be killed if the system runs out of 
resources.  We would also make fork automatically decrement the number 
by one in the child.  

The default would be 1000.  The sysadmin could then use login.conf to
lower the hard limit for particular users or user classes, and of course
set a specific limit for particular root-run processes (though, in general,
the daemons will be protected because their children will be more likely
to be killed then they will).

The system would use the importance resource to modify its search for
processes to kill - perhaps use it as a divisor.  Or the system could use
it absolutely then kill the biggest process of the N processes sitting
at the lowest importance level.

This also solves the sysad-cant-login problem and the user-is-naughty
problem.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Alfred Perlstein


On Wed, 22 Sep 1999, Matthew Dillon wrote:

 How about this - add an 'importance' resource.  The lower the number,
 the more likely the process will be killed if the system runs out of 
 resources.  We would also make fork automatically decrement the number 
 by one in the child.  
 
 The default would be 1000.  The sysadmin could then use login.conf to
 lower the hard limit for particular users or user classes, and of course
 set a specific limit for particular root-run processes (though, in general,
 the daemons will be protected because their children will be more likely
 to be killed then they will).
 
 The system would use the importance resource to modify its search for
 processes to kill - perhaps use it as a divisor.  Or the system could use
 it absolutely then kill the biggest process of the N processes sitting
 at the lowest importance level.
 
 This also solves the sysad-cant-login problem and the user-is-naughty
 problem.

I knew it would be Matt to come up with something like this, it sounds
great.

Maybe a limit to how many kills a process can score, meaning that if 
one process seems to be killing a lot of programs the system may come
down and kill it?

This along with sleeping would allow someone to log in, (with a high
importance) and probably su and still be able to manage to save the
box.

maybe? :)

-Alfred



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread David Scheidt

On Wed, 22 Sep 1999, Nate Williams wrote:

 Maybe, and then again, maybe not.  A program is requesting memory, so
 putting other processes to sleep *keeps* them from freeing up memory.

The process that is trying to use memory is put to sleep.  In the
machine runs out of swap cases I have seen (which isn't many, because I
build boxes with lots of swap) there has been one rogue process (or group of
related processes) that was using up swap.  When the process hits this, your
problem is going to go away.  It might make sense to wait to wake processes
until resource usage has dropped below some threshold, so that an operator
has a chance to intervene and correct the problem.  

Clearly, this won't solve all problems.  I think it could be made quite
useful, thoguh.

David Scheidt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Nate Williams

  Maybe, and then again, maybe not.  A program is requesting memory, so
  putting other processes to sleep *keeps* them from freeing up memory.
 
 The process that is trying to use memory is put to sleep.

Then this 'rogue' process is never allowed to free up any of it's
resources, hence the system is *still* out of swap, and all of the
non-offending processes must deal with the out-of-memory situation they
haven't caused, nor can they do anything about it.

So, now we have a system that just takes longer to completely die off
due to lack of resources since we've stopped the biggest offender from
getting bigger.

(Also, it turns out that often enough the process that requests the page
that drives the system over the edge is not in fact the rogue process,
thus causing the system to slowly become unusable with no way of
recovering.)

I'd much rather my system die quickly than slowly, since by dying
quickly I can get back to work much quicker instead of not getting any
work done for a long period of time.



Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Matthew Dillon

: (Matt)
: How about this - add an 'importance' resource.  The lower the number,
: the more likely the process will be killed if the system runs out of 
: resources.  We would also make fork automatically decrement the number 
: by one in the child.  
: 
: The default would be 1000.  The sysadmin could then use login.conf to
: lower the hard limit for particular users or user classes, and of course
: set a specific limit for particular root-run processes (though, in general,
: the daemons will be protected because their children will be more likely
: to be killed then they will).
: 
: The system would use the importance resource to modify its search for
: processes to kill - perhaps use it as a divisor.  Or the system could use
: it absolutely then kill the biggest process of the N processes sitting
: at the lowest importance level.
: 
: This also solves the sysad-cant-login problem and the user-is-naughty
: problem.
:
:I knew it would be Matt to come up with something like this, it sounds
:great.
:
:Maybe a limit to how many kills a process can score, meaning that if 
:one process seems to be killing a lot of programs the system may come
:down and kill it?
:
:This along with sleeping would allow someone to log in, (with a high
:importance) and probably su and still be able to manage to save the
:box.
:
:maybe? :)
:
:-Alfred

Well, the problem with a score like that is that we don't necessarily
know which processes are responsible for depriving the system of its
resources, so we don't know who to score against.

Reading Nate's most recent email I have to agree that whatever we come
up with, it needs to be (A) relatively simple and deterministic, and
(B) work out of the box.  I think my 'importance' resource limit idea 
covers both bases quite well.

I don't like the sleep idea at all, because it has the same problem
of not really knowing which process to sleep in the first place.  The
use of an importance resource may not be able to immediately
target the exact process causing the problem, but it should do the job 
well enough to avoid putting the system into a state that would require a
reboot to get all the right things working again.  And it certainly can
do the job well enough to allow a sysad to login.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Alfred Perlstein


On Wed, 22 Sep 1999, Matthew Dillon wrote:

 : (Matt)
 : How about this - add an 'importance' resource.  The lower the number,
 : the more likely the process will be killed if the system runs out of 
 : resources.  We would also make fork automatically decrement the number 
 : by one in the child.  
 : 
 : The default would be 1000.  The sysadmin could then use login.conf to
 : lower the hard limit for particular users or user classes, and of course
 : set a specific limit for particular root-run processes (though, in general,
 : the daemons will be protected because their children will be more likely
 : to be killed then they will).
 : 
 : The system would use the importance resource to modify its search for
 : processes to kill - perhaps use it as a divisor.  Or the system could use
 : it absolutely then kill the biggest process of the N processes sitting
 : at the lowest importance level.
 : 
 : This also solves the sysad-cant-login problem and the user-is-naughty
 : problem.
 :
 :I knew it would be Matt to come up with something like this, it sounds
 :great.
 :
 :Maybe a limit to how many kills a process can score, meaning that if 
 :one process seems to be killing a lot of programs the system may come
 :down and kill it?
 :
 :This along with sleeping would allow someone to log in, (with a high
 :importance) and probably su and still be able to manage to save the
 :box.
 :
 :maybe? :)
 :
 :-Alfred
 
 Well, the problem with a score like that is that we don't necessarily
 know which processes are responsible for depriving the system of its
 resources, so we don't know who to score against.

Chances are that the allocator is the one depriving, it will more than
likely be the one requiring more memory...  In the case of a root process
that has racked up many kills to satisfy its memory requirments it would
be important to recognize that and kill it.  The kills factor should be
reduced in after a certain amount of time so long running processes do not
exhaust resonable amounts of kills just by living for a long time.

 Reading Nate's most recent email I have to agree that whatever we come
 up with, it needs to be (A) relatively simple and deterministic, and
 (B) work out of the box.  I think my 'importance' resource limit idea 
 covers both bases quite well.

I agree.

 I don't like the sleep idea at all, because it has the same problem
 of not really knowing which process to sleep in the first place.  The
 use of an importance resource may not be able to immediately
 target the exact process causing the problem, but it should do the job 
 well enough to avoid putting the system into a state that would require a
 reboot to get all the right things working again.  And it certainly can
 do the job well enough to allow a sysad to login.

The importance of the sleep is so that the admin can see what has gone
wrong before the OS guesses at it.  We reserve processes for root, why
not a few pages?

I think killing, even when done somewhat intellegently as the only solution
seems to be the New Jersey approach.

On the other hand, until I grok more of the code I'll gladly take what I
can get. :)

One more last minute idea, with the exception of root processes, what about
trying to kill processes with the same uid?

-Alfred

   Matthew Dillon 
   [EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Wes Peters

Chuck Robey wrote:
 
 What kind of resources are there that both cause loss of swap AND are
 freed up by sleeping a process?

Any of them being consumed by short-lived processes that will run to 
completion and exit while everyone else is sleeping.  This assumes such
processes exist, of course.

-- 
"Where am I, and what am I doing in this handbasket?"

Wes Peters Softweyr LLC
[EMAIL PROTECTED]   http://softweyr.com/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Wes Peters

Matthew Dillon wrote:
 
 How about this - add an 'importance' resource.  The lower the number,
 the more likely the process will be killed if the system runs out of
 resources.  We would also make fork automatically decrement the number
 by one in the child.

As far as I'm concerned, you could use the UID for this.  ;^)

-- 
"Where am I, and what am I doing in this handbasket?"

Wes Peters Softweyr LLC
[EMAIL PROTECTED]   http://softweyr.com/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Wes Peters

Nate Williams wrote:
 
   Maybe, and then again, maybe not.  A program is requesting memory, so
   putting other processes to sleep *keeps* them from freeing up memory.
 
  The process that is trying to use memory is put to sleep.
 
 Then this 'rogue' process is never allowed to free up any of it's
 resources, hence the system is *still* out of swap, and all of the
 non-offending processes must deal with the out-of-memory situation they
 haven't caused, nor can they do anything about it.
 
 So, now we have a system that just takes longer to completely die off
 due to lack of resources since we've stopped the biggest offender from
 getting bigger.
 
 (Also, it turns out that often enough the process that requests the page
 that drives the system over the edge is not in fact the rogue process,
 thus causing the system to slowly become unusable with no way of
 recovering.)
 
 I'd much rather my system die quickly than slowly, since by dying
 quickly I can get back to work much quicker instead of not getting any
 work done for a long period of time.

Perhaps keeping track of the most recently memory-hungry process and killing
is, so we get the one that is currently asking for the most pages, or making
the most allocation requests?  (The two are not necessarily the same).

-- 
"Where am I, and what am I doing in this handbasket?"

Wes Peters Softweyr LLC
[EMAIL PROTECTED]   http://softweyr.com/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Chuck Robey

On Wed, 22 Sep 1999, Wes Peters wrote:

 Chuck Robey wrote:
  
  What kind of resources are there that both cause loss of swap AND are
  freed up by sleeping a process?
 
 Any of them being consumed by short-lived processes that will run to 
 completion and exit while everyone else is sleeping.  This assumes such
 processes exist, of course.

OK, you wait until some process exits.  Ob course, if none do, then your
entire machine, all processes that ask for memory, wedge, instead of just
one.  You can't even start 'kill -9'.

 
 


Chuck Robey| Interests include C programming, Electronics,
213 Lakeside Dr. Apt. T-1  | communications, and signal processing.
Greenbelt, MD 20770| I run picnic.mat.net: FreeBSD-current(i386) and
(301) 220-2114 |   jaunt.mat.net : FreeBSD-current(Alpha)




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Chuck Robey

On Wed, 22 Sep 1999, Alfred Perlstein wrote:

  What kind of resources are there that both cause loss of swap AND are
  freed up by sleeping a process?
 
 four things i can think of:
 
 1) Along with 'SIGDANGER' it allows the system to fix itself.
 2) Allow the operator to determine which program to kill, maybe the
'hog' is actually something that needs to run to completion and
by shutting down other systems it would survive.
 3) other processes may exit, this would free the memory needed to
continue.
 4) the operator could enable swap on an additional device giving
more backing for things to continue.
 
 don't forget the clause about killing after putting a threshold of
 active processes to sleep.

I'm thinking of the HMO administrator walking into an emergency room,
seeing a doctor who has asked permission to do an emergency procedure to
crack open the chest of someone whose heart has stopped; the HMO
administrator asking "have you tried giving him drugs to lower his
cholesterol count?"

Too little, too late.  When the system is paralyzed, any actions that
unparalyze it area the only acceptable action set.


Chuck Robey| Interests include C programming, Electronics,
213 Lakeside Dr. Apt. T-1  | communications, and signal processing.
Greenbelt, MD 20770| I run picnic.mat.net: FreeBSD-current(i386) and
(301) 220-2114 |   jaunt.mat.net : FreeBSD-current(Alpha)




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Daniel C. Sobral

Matthew Dillon wrote:
 
 How about this - add an 'importance' resource.  The lower the number,
 the more likely the process will be killed if the system runs out of
 resources.  We would also make fork automatically decrement the number
 by one in the child.

Well, that's one thing people have asked for. It can be useful, and
doesn't sound particularly hard to code, nor too intrusive or
resource-hog. Would make some people, on both camps.

Alas, some people will never let go until we have a no overcommit
switch, and *then* they'll start asking for us to go to the lengths
Solaris does to reduce the disadvantages.

--
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Thus, over the years my wife and I have physically diverged. While
I have zoomed toward a crusty middle-age, she has instead clung
doggedly to the sweet bloom of youth. Naturally I think this unfair.
Yet, if it was the other way around, I confess I wouldn't be happy
either."


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-22 Thread Chuck Robey

On Thu, 23 Sep 1999, Daniel C. Sobral wrote:

 Matthew Dillon wrote:
  
  How about this - add an 'importance' resource.  The lower the number,
  the more likely the process will be killed if the system runs out of
  resources.  We would also make fork automatically decrement the number
  by one in the child.
 
 Well, that's one thing people have asked for. It can be useful, and
 doesn't sound particularly hard to code, nor too intrusive or
 resource-hog. Would make some people, on both camps.
 
 Alas, some people will never let go until we have a no overcommit
 switch, and *then* they'll start asking for us to go to the lengths
 Solaris does to reduce the disadvantages.

Ohh, don't bring that up, mi has been waiting for that ...



Chuck Robey| Interests include C programming, Electronics,
213 Lakeside Dr. Apt. T-1  | communications, and signal processing.
Greenbelt, MD 20770| I run picnic.mat.net: FreeBSD-current(i386) and
(301) 220-2114 |   jaunt.mat.net : FreeBSD-current(Alpha)




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Out of swap handling and X lockups in 3.2R

1999-09-21 Thread Matthew Dillon

:where SIZE was 4 MB in this case. I ran it on the console (I've got 64 MB
:of RAM and 128 MB of swap) until the swap pager went out of space and
:my huge process was eventually killed as expected. Fine. But when I ran 
:it under X Window, the system eventually killed the X server (SIZE ~20 MB,
:RES ~14 MB -- the biggest RES size) instead of my big process (SIZE ~100
:MB, RES 0K). 
:
:My question is: Why was the X server killed ? Was it because the 'biggest'
:process is the one with the biggest resident memory size ?
:And if so, why not take into account the total size of processes ?

The algorithm is pretty dumb.  In fact, it would not be too difficult
to actually calculate the amount of swap being used by a process and
add that to the RSS when figuring out who to kill.

:This leads me to another (not related to swap) question:
:
:When the X server is killed, the machine simply hangs without any
:reaction to Ctrl-Alt-F1 or even Ctrl-Alt-Del. Is that the normal
:behaviour ? (I think it should get the user back to the console ?!)
:Is there any workaround ?
:
:TIA,
:
:Ivan

The X server wasn't killed nicely, it couldn't take you out of the
video mode.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message