Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread taow
> You will *probably* get stable 16GB with the vendor tuned enterprise
> kernels (RHEL, CentOS etc),

That's sounds "a little" relief. Thesis 1,2,3 has 16GB memory. Aries has 12G.

Tony Wang


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread Sylvain Robitaille

(removing Alan Cox from the Cc: list;  He does not need to be involved
in the details of our discussions of local systems ...)

On Mon, 17 Dec 2007 [EMAIL PROTECTED] wrote:

> ... Thesis 1,2,3 has 16GB memory. Aries has 12G.

Note that the Theses and Aries are Xeon systems, which are 32-bit systems
in the first place, so the problem described is likely not applicable
to these.  I fully expect that the problem being encountered on Baby Alcor
(and Bonnie?) is specifically because of the large memory configuration,
on a 64-bit system, but running a 32-bit OS configuration.  At least
that's how I understood the explanation.

-- 
--
Sylvain Robitaille  [EMAIL PROTECTED]

Systems and Network analyst   Concordia University
Instructional & Information TechnologyMontreal, Quebec, Canada
--

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread Alan Cox
On Mon, 17 Dec 2007 10:44:05 -0500 (EST)
[EMAIL PROTECTED] wrote:

> > You will *probably* get stable 16GB with the vendor tuned enterprise
> > kernels (RHEL, CentOS etc),
> 
> That's sounds "a little" relief. Thesis 1,2,3 has 16GB memory. Aries has 12G.

If you can run a 64bit kernel, it will save you an inordinate amount of
pain on a box with > 4GB of RAM. The fact > 4GB is possible on such a box
in 32bit doesn't make it a good idea.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread Alan Cox
> ...but I've run into a situation in which a system on which I *have* set
> no overcommit is being blasted by the OOM killer anyway.

Looks like the kernel is eating all the resources needed.

>Linux babyalcor 2.6.23.1 #1 SMP Fri Oct 26 15:35:18 EDT 2007 \
> i686 Dual Core AMD Opteron(tm) Processor 280 AuthenticAMD GNU/Linux

32bit kernel, 16GB of RAM. 

No suprise I'm afraid. Handling 16GB on a 32bit kernel, which has to
manage it all through a small addressible memory window is right on the
limit of what the standard kernel will handle (8GB is probably as high as
I would go). The no overcommit code ensures that user space doesn't
overcommit, but the kernel can get itself short of low memory resources
on a big box with 32bit kernels very easily. (In 64bit mode the CPU can
address all the memory directly so the problem vanishes).

You will *probably* get stable 16GB with the vendor tuned enterprise
kernels (RHEL, CentOS etc), or run a 64bit kernel and then the kernel
isn't trying the software equivalent of managing a filing cabinet through
the keyhole.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread Alan Cox
 ...but I've run into a situation in which a system on which I *have* set
 no overcommit is being blasted by the OOM killer anyway.

Looks like the kernel is eating all the resources needed.

Linux babyalcor 2.6.23.1 #1 SMP Fri Oct 26 15:35:18 EDT 2007 \
 i686 Dual Core AMD Opteron(tm) Processor 280 AuthenticAMD GNU/Linux

32bit kernel, 16GB of RAM. 

No suprise I'm afraid. Handling 16GB on a 32bit kernel, which has to
manage it all through a small addressible memory window is right on the
limit of what the standard kernel will handle (8GB is probably as high as
I would go). The no overcommit code ensures that user space doesn't
overcommit, but the kernel can get itself short of low memory resources
on a big box with 32bit kernels very easily. (In 64bit mode the CPU can
address all the memory directly so the problem vanishes).

You will *probably* get stable 16GB with the vendor tuned enterprise
kernels (RHEL, CentOS etc), or run a 64bit kernel and then the kernel
isn't trying the software equivalent of managing a filing cabinet through
the keyhole.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread Alan Cox
On Mon, 17 Dec 2007 10:44:05 -0500 (EST)
[EMAIL PROTECTED] wrote:

  You will *probably* get stable 16GB with the vendor tuned enterprise
  kernels (RHEL, CentOS etc),
 
 That's sounds a little relief. Thesis 1,2,3 has 16GB memory. Aries has 12G.

If you can run a 64bit kernel, it will save you an inordinate amount of
pain on a box with  4GB of RAM. The fact  4GB is possible on such a box
in 32bit doesn't make it a good idea.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread Sylvain Robitaille

(removing Alan Cox from the Cc: list;  He does not need to be involved
in the details of our discussions of local systems ...)

On Mon, 17 Dec 2007 [EMAIL PROTECTED] wrote:

 ... Thesis 1,2,3 has 16GB memory. Aries has 12G.

Note that the Theses and Aries are Xeon systems, which are 32-bit systems
in the first place, so the problem described is likely not applicable
to these.  I fully expect that the problem being encountered on Baby Alcor
(and Bonnie?) is specifically because of the large memory configuration,
on a 64-bit system, but running a 32-bit OS configuration.  At least
that's how I understood the explanation.

-- 
--
Sylvain Robitaille  [EMAIL PROTECTED]

Systems and Network analyst   Concordia University
Instructional  Information TechnologyMontreal, Quebec, Canada
--

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-17 Thread taow
 You will *probably* get stable 16GB with the vendor tuned enterprise
 kernels (RHEL, CentOS etc),

That's sounds a little relief. Thesis 1,2,3 has 16GB memory. Aries has 12G.

Tony Wang


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-07 Thread Alan Cox
On Fri, 07 Dec 2007 10:25:23 +0100
Martin MOKREJŠ <[EMAIL PROTECTED]> wrote:

> Hi,
>   first of all, sorry for not being up to date with how the OOM killer
> works. I think there used to be a kernel config option to disable
> OOM killer and instead kill the process which actually asks for the
> memory and supposedly caused the memory lack. That is what I would
> like to have on my system. I a have a 1GB RAM laptop and use t-coffee
> software from 
> http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html
> to do some science. ;)

The OOM killer triggers where there is no way to fulfill a page request.
Something has to go and there is no real notion of "right" or "wrong"
process at that point.

You can either set no overcommit in which case you'll get failed malloc
and similar rather than allow overcommit, or you can set the OOM priority
of tasks yourself so that your specific app of choice always dies first.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-07 Thread Dan Kegel
Marting Mokreja wrote:
> first of all, sorry for not being up to date with how the OOM killer
> works. I think there used to be a kernel config option to disable
> OOM killer and instead kill the process which actually asks for the
> memory and supposedly caused the memory lack. That is what I would
> like to have on my system. I a have a 1GB RAM laptop

You probably just need to add more swap space on your system,

Any time the OOM killer fires, something's wrong with the
system, and it's more productive to deal with that than to
wish for a more accurate OOM killer; see http://lwn.net/Articles/111408/

When I was working at a company that used embedded Linux,
I eventually figured this out, and patched the kernel to panic on OOM
conditions; that gave users the right incentive to avoid
configuring jobs that caused the system to run out of memory.
- Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-07 Thread Alan Cox
On Fri, 07 Dec 2007 10:25:23 +0100
Martin MOKREJŠ [EMAIL PROTECTED] wrote:

 Hi,
   first of all, sorry for not being up to date with how the OOM killer
 works. I think there used to be a kernel config option to disable
 OOM killer and instead kill the process which actually asks for the
 memory and supposedly caused the memory lack. That is what I would
 like to have on my system. I a have a 1GB RAM laptop and use t-coffee
 software from 
 http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html
 to do some science. ;)

The OOM killer triggers where there is no way to fulfill a page request.
Something has to go and there is no real notion of right or wrong
process at that point.

You can either set no overcommit in which case you'll get failed malloc
and similar rather than allow overcommit, or you can set the OOM priority
of tasks yourself so that your specific app of choice always dies first.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


re: 2.6.23.8: OOM killer kills wrong jobs

2007-12-07 Thread Dan Kegel
Marting Mokreja wrote:
 first of all, sorry for not being up to date with how the OOM killer
 works. I think there used to be a kernel config option to disable
 OOM killer and instead kill the process which actually asks for the
 memory and supposedly caused the memory lack. That is what I would
 like to have on my system. I a have a 1GB RAM laptop

You probably just need to add more swap space on your system,

Any time the OOM killer fires, something's wrong with the
system, and it's more productive to deal with that than to
wish for a more accurate OOM killer; see http://lwn.net/Articles/111408/

When I was working at a company that used embedded Linux,
I eventually figured this out, and patched the kernel to panic on OOM
conditions; that gave users the right incentive to avoid
configuring jobs that caused the system to run out of memory.
- Dan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/