Re: Ideas for the oom problem

2001-03-28 Thread Hacksaw

> On Wed, Mar 28, 2001 at 06:33:04PM -0500, Hacksaw wrote:

> Why are they logged in as root in the first place? Is there something they
> can't do over sudo?

I have the "Gnome workstation" version of rawhide (7.0.xxx) on my new laptop. 
I don't see sudo. Of course, it's rawhide, but you'd think, if it were in 7.0, 
it'd make it. Or maybe they decided that the gnome workstation didn't need 
it... Hmmm.

> I definitely remember seeing a document saying `if you find yourself needing to
> `man foo', do it in another terminal as your non-root self'; it might or might
> not've been the SAG.

Sucks if you are trying to figure out a VT problem. 
 
> In any case, what happened to `if you use this rope you will hang yourself'?
> There has to be a point where you abandon catering for all kinds of fool and
> get on with writing something useful, I think.

Let's accept one thing: Root, should in fact, be allowed to do anything a 
regular user can. The fact that hanging is a possibility might ought to be 
pointed out. I have my shell set up to tell me I'm root. But the fact is, the 
typical sys-admin is essentially always logged in as root somewhere, and 
changing terminals to look at man pages is sometimes not an option.

For that matter, I have often figured out that something had funny permission 
problems by discovering that the problem goes away if I run a program as root.

Assuming everything root is doing must be sacrosanct is a pipe dream.  
Assuming everything a regular user is doing is expendable is BOFH think.

I do agree that you have to draw a line. I'm just saying that's the wrong one.

> > I completely agree that doing general work as root is a bad idea. I do most
> > root things via sudo. It sure would be nice if all the big dists supplied it
> > (Hey, RedHat! You listening?) as part of their normal set.
> 
> RH have been listening since v7.0.

Good. I hope it comes out well in 7.1, considering my experience with rawhide.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-28 Thread Tim Haynes

On Wed, Mar 28, 2001 at 06:33:04PM -0500, Hacksaw wrote:

> > Anyone working as root is (sorry) an idiot! root's processes are normally
> > quite system-relevant and so they should never be killed, if we can avoid 
> > it.
> 
> The real world intrudes. Root sometimes needs to look at documentation,
> which, these days is often available as html. Sometimes it's only as html.
> And people in a panic who aren't trained sys-admins aren't going to remember
> to log in as someone else.

Why are they logged in as root in the first place? Is there something they
can't do over sudo?
I definitely remember seeing a document saying `if you find yourself needing to
`man foo', do it in another terminal as your non-root self'; it might or might
not've been the SAG.

In any case, what happened to `if you use this rope you will hang yourself'?
There has to be a point where you abandon catering for all kinds of fool and
get on with writing something useful, I think.

> I completely agree that doing general work as root is a bad idea. I do most
> root things via sudo. It sure would be nice if all the big dists supplied it
> (Hey, RedHat! You listening?) as part of their normal set.

RH have been listening since v7.0.

~Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-28 Thread Hacksaw

> --On Wednesday, March 28, 2001 09:38:04 -0500 Hacksaw <[EMAIL PROTECTED]> 
> wrote:
> >
> > Deciding what not to kill based on who started it seems like a bad idea.
> > Root  can start netscape just as easily as any user, but if the choice of
> > processes  to kill is root's netscape or a user's experimental database,
> > I'd want the  netscape to go away.
> 
> root does not use netscape -FULLSTOP-

Making assumptions about what users will do is foolish. 

> Anyone working as root is (sorry) an idiot! root's processes are normally
> quite system-relevant and so they should never be killed, if we can avoid 
> it.

The real world intrudes. Root sometimes needs to look at documentation, which, 
these days is often available as html. Sometimes it's only as html. And people 
in a panic who aren't trained sys-admins aren't going to remember to log in as 
someone else.

I completely agree that doing general work as root is a bad idea. I do most 
root things via sudo. It sure would be nice if all the big dists supplied it 
(Hey, RedHat! You listening?) as part of their normal set.

> There can however be processes owned by other users which shouldn't be
> killed in OOM-Situation, but generally root's processes are more important
> than a normal user's processes.

I'd suggest that this is going to change. Not to regular users, though, so 
it's still a good point. But we should be figuring out how to compartmentalize 
all our servers. Rarely do most servers need to run as root. Just login ones, 
and those should be limited.

So which should die, the users experiment, or identd?

> What about doing something really critical to avoid the upcoming OOM-situ
> and get your shell killed because you were to slow?

Right. I agree that roots shell should be exempt. It may be that all shells 
should be exempt, or maybe all recent shells.

Better, though, would be to establish the idea of "linchpins".

A linchpin is a process marked with a don't kill for OOM flag (a capability?). 
Only those in root group should be able to start one. And darn few things 
should be marked as such. Some very small shell, vi, ed, maybe a small emacs. 
Just enough so that our heroic admin can gracefully ease the OOM situ by 
changing a few bits of /etc or killing off a few well chosen processes.

On the other hand, a flag that says "kill me first" might be even better.

In any case, I'd certainly expect the OOM killer to sort by memory usage, and 
kill off the hogs first. I assume it does that.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-28 Thread Andreas Rogge

--On Wednesday, March 28, 2001 09:38:04 -0500 Hacksaw <[EMAIL PROTECTED]> 
wrote:
>
> Deciding what not to kill based on who started it seems like a bad idea.
> Root  can start netscape just as easily as any user, but if the choice of
> processes  to kill is root's netscape or a user's experimental database,
> I'd want the  netscape to go away.

root does not use netscape -FULLSTOP-

Anyone working as root is (sorry) an idiot! root's processes are normally
quite system-relevant and so they should never be killed, if we can avoid 
it.
There can however be processes owned by other users which shouldn't be
killed in OOM-Situation, but generally root's processes are more important
than a normal user's processes.
What about doing something really critical to avoid the upcoming OOM-situ
and get your shell killed because you were to slow?

--
Andreas Rogge <[EMAIL PROTECTED]>
Available on IRCnet:#linux.de as Dyson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-28 Thread Hacksaw

> a. don't kill any task with a uid < 100 
> 
> b. if uid between 100 to 500 or CAP-SYS equivalent enabled 
>   set it too a lower priority, so if it is at fault it will happen slower
>
> giving more time before the system collapses

Deciding what not to kill based on who started it seems like a bad idea. Root 
can start netscape just as easily as any user, but if the choice of processes 
to kill is root's netscape or a user's experimental database, I'd want the 
netscape to go away.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-27 Thread Jonathan Morton

I'm going to be gentle here and try to point out where your suggestions are
flawed...

>a. don't kill any task with a uid < 100

Suppose your system daemon springs a leak?  It will have to be killed
eventually, however system daemons can sensibly be given a little "grace".
Also, the UIDs used by a system daemon vary from system to system.

>b. if uid between 100 to 500 or CAP-SYS equivalent enabled
>   set it too a lower priority, so if it is at fault it will happen
>slower
>giving more time before the system collapses

Not slowly enough.  When your system is thrashing, the CPU is the resource
under least pressure, so "nice" values and priorities have virtually zero
effect.  In any case, under OOM conditions the system has *already*
collapsed and we *have* to kill something for the system to keep running.

>c.  if a task is nice'd then immediately put the task too sleep, and schedule
>all code / data too be swapped out, or thrown away as appropiate. do not
>reschedule the task too continue until memory is available

In OOM conditions there is no swap space left to do what you suggest.  This
is a sensible solution for when thrashing is the only problem...

>d. kill any normal user interactive tasks that is started during a memory
>crisis.

Define "memory crisis".  However, this is a relatively sensible solution.

>allocate a pool of memory at system start up that is too be released to the
>memory pool when the system is in a memory crisis. This will reduce system
>swapping, and allow the system too stablize slightly

One of my patches already tries to do this, in a way.  It doesn't yet
provide a hard barrier, but it does prevent applications from hogging the
entire memory on the system (at least, without expending some effort into
it).

>report any task asking for large pool of memory while the system is in
>oom crisis. if uid > 500 and was started from an interactive shell it should
>be killed.

See above.  malloc() fails, which tells the application there is no more
memory in the system.  A well-written application will respond to this and
use more memory-conservative techniques.  A poorly-written application will
segfault.  End Of Problem.  Now to make memory accounting work properly so
these tests are reliable...

>when the crisis is ended, re-adquire the memory pool for later usage.

It is never given up, except when it is needed by the kernel itself (eg. to
swap in pages or (in the absence of true memory accounting) to provide COW
space.

>Prong 3 providing  information about oom crisis too user land
>
>create /proc/vm/oom_crisis this would be readonly file owned by root it would
>report if the system is in crisis and the uid of any process that is asking
>for large amounts of ram while the system
>is in crisis.

This kind of information is already available using /proc - applications
just have to look int he right places.

>create a SIGDANGER handler that is sent out too all tasks that have
>registered a handler when the kernel enters oom_kill, give these tasks a high
>priority access too system resources.

This is a fairly good idea, why does it look so familiar?  :)  SIGDANGER
would be sent to all processes when memory availablility goes below a
threshold, ie. when there is still enough memory left to handle the
situation.  The default handler would be a no-op, preserving compatibility.
However, the notion of "high priority access to resources" is not currently
feasible (or necessary).

>this would enable user land programs too deal with the situation with out
>continuous polling free ram/swap. They could email/page sysadmin and user
>about the crisis and add additional swap resources and kill any know  non
>essential tasks. and probe system for possible broken tasks, such as
>netscape-common tasks not connected too netscape client, at least i have been
>known too find these when netscape crashes.

Interesting applications for this signal.  However, this is entirely a
userspace issue as to what to do with the signal - the kernel's job is to
provide it (if we decide to, that is).

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-27 Thread Rik van Riel

On Tue, 27 Mar 2001, Doug Ledford wrote:

> Now, I wouldn't bring this up as a big issue except I keep seeing
> people say things like "why so complex a solution for something that
> is only used in emergency situations".  My point is that it *IS NOT*
> being using only in emergency situations and that is what needs fixed.  

Exactly.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-27 Thread Doug Ledford

Rik van Riel wrote:
> 
> On Tue, 27 Mar 2001, Doug Ledford wrote:
> 
> > I've been using our internal tree for my testing, and I'm reluctant to
> > let my experiences there cause me to draw conclusions about other
> > trees.  So, will you please tell me which version of the kernel you
> > think has a vm that only triggers the oom killer in emergency
> > situations so I can test it here to see if you are right?
> 
> Detecting WHEN we're OOM is quite unrelated from chosing WHAT
> to do when we're OOM.
> 
> There is currently no kernel that I'm aware of which does the
> OOM kill at the "exact right" moment.

I'm not looking for "exact right".  I'm looking for "in the ballpark".  Hell,
I'm not even that picky.  "In the right country" will do for me.  But right
now, what I'm seeing, is a vm that will trigger the oom_killer with 900Mb of a
1GB machine used for nothing but disk cache.

Now, I wouldn't bring this up as a big issue except I keep seeing people say
things like "why so complex a solution for something that is only used in
emergency situations".  My point is that it *IS NOT* being using only in
emergency situations and that is what needs fixed.  Now, I'm willing to allow
that our internal kernel may trigger an oom at different times than the kernel
you use.  That's why I asked what kernel you want me to test in order to
establish whether or not I'm right about how far off the oom_killer trigger
really is.

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-27 Thread Rik van Riel

On Tue, 27 Mar 2001, Doug Ledford wrote:

> I've been using our internal tree for my testing, and I'm reluctant to
> let my experiences there cause me to draw conclusions about other
> trees.  So, will you please tell me which version of the kernel you
> think has a vm that only triggers the oom killer in emergency
> situations so I can test it here to see if you are right?

Detecting WHEN we're OOM is quite unrelated from chosing WHAT
to do when we're OOM.

There is currently no kernel that I'm aware of which does the
OOM kill at the "exact right" moment.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-27 Thread james

On Tuesday 27 March 2001 18:52, Rik van Riel wrote:
> On Tue, 27 Mar 2001, james wrote:
> > Here are my ideas on how too deal with the oom situation,
> >
> > I propose a three prong approach too this problem
>
> Isn't that a bit much for an emergency situation that never
> even occurs on most systems ?
>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
>   http://www.surriel.com/
> http://www.conectiva.com/ http://distro.conectiva.com.br/


Given the amount, trafic on this mailing list and other places that this 
topic has created. Most of what I propose is not new it was proposed by 
others on this list.  Prong 1 is pretty much what oom_kill does with some 
slight canges and an addition of putting nice tasks too sleep, prong 2 is a 
variation of providing resources too root user, along with some resource 
accounting information that can be used both in the kernel and userland. If 
we don't get the right task, the problem continues too progress,. untill the 
right task is found or the system is brought too it knees.  Prong three 
provides a way too communicate with userland providing what aix does, and 
provides some level of being proactive instead of just be reactive where we 
have unto now been doing the wrong thing according too other readers of this 
list.  


james

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-27 Thread Doug Ledford

Rik van Riel wrote:
> 
> On Tue, 27 Mar 2001, james wrote:
> 
> > Here are my ideas on how too deal with the oom situation,
> 
> > I propose a three prong approach too this problem
> 
> Isn't that a bit much for an emergency situation that never
> even occurs on most systems ?

I've been using our internal tree for my testing, and I'm reluctant to let my
experiences there cause me to draw conclusions about other trees.  So, will
you please tell me which version of the kernel you think has a vm that only
triggers the oom killer in emergency situations so I can test it here to see
if you are right?

-- 

 Doug Ledford <[EMAIL PROTECTED]>  http://people.redhat.com/dledford
  Please check my web site for aic7xxx updates/answers before
  e-mailing me about problems
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Ideas for the oom problem

2001-03-27 Thread Rik van Riel

On Tue, 27 Mar 2001, james wrote:

> Here are my ideas on how too deal with the oom situation,

> I propose a three prong approach too this problem

Isn't that a bit much for an emergency situation that never
even occurs on most systems ?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Ideas for the oom problem

2001-03-27 Thread james


Hi Kernel Guru's

Here are my ideas on how too deal with the oom situation, most of these 
should be thought of stuff to do in 2.5.x kernels, because it touches a lot 
of kernel path ways,  with possible back porting 
once it is tested. 

I propose a three prong approach too this problem

Prong 1: WHAT TOO KILL 

a. don't kill any task with a uid < 100 

b. if uid between 100 to 500 or CAP-SYS equivalent enabled 
set it too a lower priority, so if it is at fault it will happen slower

giving more time before the system collapses

c.  if a task is nice'd then immediately put the task too sleep, and schedule 
all code / data too be swapped out, or thrown away as appropiate. do not 
reschedule the task too continue until memory is available 

d. kill any normal user interactive tasks that is started during a memory 
crisis. 

Prong 2 WHAT TOO DO ABOUT STABILIZING THE SYSTEM 

allocate a pool of memory at system start up that is too be released to the 
memory pool when the system is in a memory crisis. This will reduce system 
swapping, and allow the system too stablize slightly

report any task asking for large pool of memory while the system is in 
oom crisis. if uid > 500 and was started from an interactive shell it should 
be killed. 

when the crisis is ended, re-adquire the memory pool for later usage. 

Prong 3 providing  information about oom crisis too user land 

create /proc/vm/oom_crisis this would be readonly file owned by root it would 
report if the system is in crisis and the uid of any process that is asking 
for large amounts of ram while the system 
is in crisis. 

create a SIGDANGER handler that is sent out too all tasks that have 
registered a handler when the kernel enters oom_kill, give these tasks a high 
priority access too system resources. 

this would enable user land programs too deal with the situation with out 
continuous polling free ram/swap. They could email/page sysadmin and user 
about the crisis and add additional swap resources and kill any know  non 
essential tasks. and probe system for possible broken tasks, such as 
netscape-common tasks not connected too netscape client, at least i have been 
known too find these when netscape crashes. 



Okay that is my idea, i am putting on my flame proof suit and getting ready 
for the flames that are sure too come my way.



James 
kernelnewbie in training 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/