Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Simon, > So if he config sparse memory, the issue can be solved I think. In my config file I have: CONFIG_HAVE_SPARSE_IRQ=y CONFIG_SPARSE_IRQ=y CONFIG_ARCH_SPARSEMEM_ENABLE=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_SPARSEMEM_STATIC=y # CONFIG_INPUT_SPARSEKMAP is not set # CONFIG_SPARSE_RCU_POINTER is not set Is that sufficient for sparse memory, or should I try something else? Or maybe, you meant that some kernel source patches might be possible in the sparse memory code? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201302242210.r1omadad021...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On 01/14/2013 11:00 PM, Dave Hansen wrote: On 01/11/2013 07:31 PM, paul.sz...@sydney.edu.au wrote: Seems that any i386 PAE machine will go OOM just by running a few processes. To reproduce: sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600 & ((n=n+1)); done' My machine has 64GB RAM. With previous OOM episodes, it seemed that running (booting) it with mem=32G might avoid OOM; but an OOM was obtained just the same, and also with lower memory: Memorysleeps to OOM free shows total (mem=64G) 5300 64447796 mem=32G 10200 31155512 mem=16G 13400 14509364 mem=8G14200 6186296 mem=6G15200 4105532 mem=4G16400 2041364 The machine does not run out of highmem, nor does it use any swap. I think what you're seeing here is that, as the amount of total memory increases, the amount of lowmem available _decreases_ due to inflation of mem_map[] (and a few other more minor things). The number of sleeps So if he config sparse memory, the issue can be solved I think. you can do is bound by the number of processes, as you noticed from ulimit. Creating processes that don't use much memory eats a relatively large amount of low memory. This is a sad (and counterintuitive) fact: more RAM actually *CREATES* RAM bottlenecks on 32-bit systems. On my large machine, 'free' fails to show about 2GB memory, e.g. with mem=16G it shows: root@zeno:~# free -l total used free sharedbuffers cached Mem: 14509364 435440 14073924 0 4068 111328 Low:769044 120232 648812 High: 13740320 315208 13425112 -/+ buffers/cache: 320044 14189320 Swap:134217724 0 134217724 You probably have a memory hole. mem=16G means "give me all the memory below the physical address at 16GB". It does *NOT* mean, "give me enough memory such that 'free' will show ~16G available." If you have a 1.5GB hole below 16GB, and you do mem=16G, you'll end up with ~14.5GB available. The e820 map (during early boot in dmesg) or /proc/iomem will let you locate your memory holes. Dear Dave, two questions here: 1) e820 map is read from BIOS, correct? So if all kinds of ranges dump from /proc/iomem are setup by BIOS? 2) only "System RAM" range dump from /proc/iomem can be treated as real memory, all other ranges can be treated as holes, correct? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/51209e9c.3020...@gmail.com
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On Thu 2013-01-31 23:38:27, Phil Turmel wrote: > On 01/31/2013 10:13 PM, paul.sz...@sydney.edu.au wrote: > > [trim /] Does not that prove that PAE is broken? > > Please, Paul, take *yes* for an answer. It is broken. You've received > multiple dissertations on why it is going to stay that way. Unless you > fix it yourself, and everyone seems to be politely wishing you the best > of luck with that. It is not Paul's job to fix PAE. It is job of whoever broke it to do so. If it is broken with 2GB of RAM, it is clearly not the known "lowmem starvation" issue, it is something else... and probably worth debugging. So, Paul, if you have time and interest... Try to find some old kernel version where sleep test works with PAE. Hopefully there is one. Then do bisection... author of the patch should then fix it. (And if not, at least you have patch you can revert.) rjw is worth cc-ing at that point. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130201102044.ga2...@amd.pavel.ucw.cz
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On 01/31/2013 10:13 PM, paul.sz...@sydney.edu.au wrote: > [trim /] Does not that prove that PAE is broken? Please, Paul, take *yes* for an answer. It is broken. You've received multiple dissertations on why it is going to stay that way. Unless you fix it yourself, and everyone seems to be politely wishing you the best of luck with that. > Cheers, Paul Regards, Phil -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/510b46c3.5040...@turmel.org
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, PAE is broken for any amount of RAM. >>> No it isn't. >> Could I please ask you to expand on that? > > I already did, a few messages back. OK, thanks. Noting however that fewer than those back, I said: ... PAE with any RAM fails the "sleep test": n=0; while [ $n -lt 33000 ]; do sleep 600 & ((n=n+1)); done and somewhere also said that non-PAE passes. Does not that prove that PAE is broken? Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201302010313.r113dtj3027...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On Fri, 2013-02-01 at 13:12 +1100, paul.sz...@sydney.edu.au wrote: > Dear Ben, > > >> PAE is broken for any amount of RAM. > > > > No it isn't. > > Could I please ask you to expand on that? I already did, a few messages back. Ben. -- Ben Hutchings Everything should be made as simple as possible, but not simpler. - Albert Einstein signature.asc Description: This is a digitally signed message part
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, >> PAE is broken for any amount of RAM. > > No it isn't. Could I please ask you to expand on that? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201302010212.r112c6uq005...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On Fri, 2013-02-01 at 10:06 +1100, paul.sz...@sydney.edu.au wrote: > Dear Ben, > > > Based on your experience I might propose to change the automatic kernel > > selection for i386 so that we use 'amd64' on a system with >16GB RAM and > > a capable processor. > > Don't you mean change to amd64 for >4GB (or any RAM), never using PAE? > PAE is broken for any amount of RAM. [...] No it isn't. Ben. -- Ben Hutchings Everything should be made as simple as possible, but not simpler. - Albert Einstein signature.asc Description: This is a digitally signed message part
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, (Removing the mailing lists linux-ker...@vger.kernel.org linux...@kvack.org from CC, as this may be of no interest to them.) >> Seems that amd64 now works "somewhat": on Debian the linux-image package >> is tricky to install, > > If you do an i386 (userland) installation then you must either select > expert mode to get a choice of kernel packages, or else install the > 'amd64' kernel package afterward. > >> and linux-headers is even harder. > > In what way? Something about dependencies; though some of that may also be due to my mixing of squeeze and wheezy 3.2.35 kernels. I will wait for the Debian defaults to change to amd64 before reporting these oddities. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301312309.r0vn9ftv012...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, > Based on your experience I might propose to change the automatic kernel > selection for i386 so that we use 'amd64' on a system with >16GB RAM and > a capable processor. Don't you mean change to amd64 for >4GB (or any RAM), never using PAE? PAE is broken for any amount of RAM. More precisely, PAE with any RAM fails the "sleep test": n=0; while [ $n -lt 33000 ]; do sleep 600 & ((n=n+1)); done and with >32GB fails the "write test": n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; ((n=n+1)); done Why do you think 16GB is significant? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301312306.r0vn6tbx012...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On Thu, 2013-01-31 at 20:07 +1100, paul.sz...@sydney.edu.au wrote: > Dear Ben, > > Thanks for the repeated explanations. > > > PAE was a stop-gap ... > > ... [PAE] completely untenable. > > Is this a good time to withdraw PAE, to tell the world that it does not > work? Maybe you should have had such comments in the code. > > Seems that amd64 now works "somewhat": on Debian the linux-image package > is tricky to install, If you do an i386 (userland) installation then you must either select expert mode to get a choice of kernel packages, or else install the 'amd64' kernel package afterward. > and linux-headers is even harder. In what way? > Is there work being done to make this smoother? [...] Debian users are now generally installing a full amd64 (userland and kernel installation. The default installation image linked from www.debian.org is the 32/64-bit net-installer which will install amd64 if the system is capable of it. Based on your experience I might propose to change the automatic kernel selection for i386 so that we use 'amd64' on a system with >16GB RAM and a capable processor. Ben. -- Ben Hutchings If more than one person is responsible for a bug, no one is at fault. signature.asc Description: This is a digitally signed message part
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, Thanks for the repeated explanations. > PAE was a stop-gap ... > ... [PAE] completely untenable. Is this a good time to withdraw PAE, to tell the world that it does not work? Maybe you should have had such comments in the code. Seems that amd64 now works "somewhat": on Debian the linux-image package is tricky to install, and linux-headers is even harder. Is there work being done to make this smoother? --- I am still not convinced by the "lowmem starvation" explanation: because then PAE should have worked fine on my 3GB machine; maybe I should also try PAE on my 512MB laptop. - Though, what do I know, have not yet found the buggy line of code I believe is lurking there... Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301310907.r0v974j9017...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On Thu, 2013-01-31 at 06:40 +1100, paul.sz...@sydney.edu.au wrote: > Dear Pavel and Dave, > > > The assertion was that 4GB with no PAE passed a forkbomb test (ooming) > > while 4GB of RAM with PAE hung, thus _PAE_ is broken. > > Yes, PAE is broken. Still, maybe the above needs slight correction: > non-PAE HIGHMEM4G passed the "sleep test": no OOM, nothing unexpected; > whereas PAE OOMed then hung (tested with various RAM from 3GB to 64GB). > > The feeling I get is that amd64 is proposed as a drop-in replacement for > PAE, that support and development of PAE is gone, that PAE is dead. PAE was a stop-gap that kept x86-32 alive on servers until x86-64 came along (though it was supposed to be ia64...). That's why I was surprised you were still trying to run a 32-bit kernel. The fundamental problem with Linux on 32-bit systems for the past ~10 years has been that RAM sizes approached and exceeded the 32-bit virtual address space and the kernel can't keep it all mapped. Whenever a task makes a system call the kernel will continue to use the same virtual memory mappings to access that task's memory, as well as its own memory. Which means both of those have to fit within the virtual address space. (The alternative of using separate address spaces is pretty bad for performance - see OS X as an example. And it only helps you as far as 4GB RAM.) The usual split on 32-bit machines is 3GB virtual address space for each task and 1GB for the kernel. Part of that 1GB is reserved for memory- mapped I/O and temporary mappings, and the rest is mapped to the beginning of RAM (lowmem). All the remainder of RAM is highmem, available for allocation by tasks but not for the kernel's private data (in general). Switching to PAE does not change the amount of lowmem, but it does make hardware page table entries (which of course live in lowmem) twice as big. This increases the pressure on lowmem a little, which probably explains the negative result of your 'sleep test'. However if you then try to take full advantage of the 64GB range of PAE, as you saw earlier, the shortage of lowmem relative to highmem becomes completely untenable. Ben. -- Ben Hutchings If more than one person is responsible for a bug, no one is at fault. signature.asc Description: This is a digitally signed message part
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Pavel and Dave, > The assertion was that 4GB with no PAE passed a forkbomb test (ooming) > while 4GB of RAM with PAE hung, thus _PAE_ is broken. Yes, PAE is broken. Still, maybe the above needs slight correction: non-PAE HIGHMEM4G passed the "sleep test": no OOM, nothing unexpected; whereas PAE OOMed then hung (tested with various RAM from 3GB to 64GB). The feeling I get is that amd64 is proposed as a drop-in replacement for PAE, that support and development of PAE is gone, that PAE is dead. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301301940.r0ujeeka016...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On 01/30/2013 04:51 AM, Pavel Machek wrote: > Are you saying that HIGHMEM configuration with 4GB ram is not expected > to work? Not really. The assertion was that 4GB with no PAE passed a forkbomb test (ooming) while 4GB of RAM with PAE hung, thus _PAE_ is broken. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/51093d03.8070...@linux.vnet.ibm.com
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Hi! > > I understand that more RAM leaves less lowmem. What is unacceptable is > > that PAE crashes or freezes with OOM: it should gracefully handle the > > issue. Noting that (for a machine with 4GB or under) PAE fails where the > > HIGHMEM4G kernel succeeds and survives. > > You have found a delta, but you're not really making apples-to-apples > comparisons. The page tables (a huge consumer of lowmem in your bug > reports) have much more overhead on a PAE kernel. A process with a > single page faulted in with PAE will take at least 4 pagetable pages > (it's 7 in practice for me with sleeps). It's 2 pages minimum (and in > practice with sleeps) on HIGHMEM4G. > > There's probably a bug here. But, it's incredibly unlikely to be seen > in practice on anything resembling a modern system. The 'sleep' issue > is easily worked around by upgrading to a 64-bit kernel, or using Are you saying that HIGHMEM configuration with 4GB ram is not expected to work? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130130125151.gb19...@amd.pavel.ucw.cz
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On 01/17/2013 01:04 PM, paul.sz...@sydney.edu.au wrote: >>> On my large machine, 'free' fails to show about 2GB memory ... >> You probably have a memory hole. ... >> The e820 map (during early boot in dmesg) or /proc/iomem will let you >> locate your memory holes. > > Now that my machine is running an amd64 kernel, 'free' shows total Mem > 65854128 (up from 64447796 with PAE kernel), and I do not see much > change in /proc/iomem output (below). Is that as should be? Yeah, that all looks sane. Your increased memory is because your 64GB machine had some of its memory mapped _above_ the 64GB physical memory limit that PAE has. /proc/iomem is generally just a dump of what the hardware *is*, so it shouldn't change between kernels. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/50f8734c.2080...@linux.vnet.ibm.com
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Dave, >> On my large machine, 'free' fails to show about 2GB memory ... > You probably have a memory hole. ... > The e820 map (during early boot in dmesg) or /proc/iomem will let you > locate your memory holes. Now that my machine is running an amd64 kernel, 'free' shows total Mem 65854128 (up from 64447796 with PAE kernel), and I do not see much change in /proc/iomem output (below). Is that as should be? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- root@zeno:~# uname -a Linux zeno.maths.usyd.edu.au 3.2.35-pk06.12-amd64 #2 SMP Thu Jan 17 13:19:53 EST 2013 x86_64 GNU/Linux root@zeno:~# free total used free sharedbuffers cached Mem: 658541281591704 64262424 0 227036 175620 -/+ buffers/cache:1189048 64665080 Swap:195312636 0 195312636 root@zeno:~# cat /proc/iomem - : reserved 0001-00099bff : System RAM 00099c00-0009 : reserved 000a-000b : PCI Bus :00 000c-000d : PCI Bus :00 000c-000c7fff : Video ROM 000c8000-000cf5ff : Adapter ROM 000cf800-000d07ff : Adapter ROM 000d0800-000d0bff : Adapter ROM 000e-000f : reserved 000f-000f : System ROM 0010-7e445fff : System RAM 0100-0168f8c6 : Kernel code 0168f8c7-018f24bf : Kernel data 0197d000-019dafff : Kernel bss 7e446000-7e565fff : ACPI Non-volatile Storage 7e566000-7f1e2fff : reserved 7f1e3000-7f25efff : ACPI Tables 7f25f000-7f31cfff : reserved 7f31d000-7f323fff : ACPI Non-volatile Storage 7f324000-7f333fff : reserved 7f334000-7f33bfff : ACPI Non-volatile Storage 7f33c000-7f365fff : reserved 7f366000-7f7f : ACPI Non-volatile Storage 7f80-7fff : RAM buffer 8000-dfff : PCI Bus :00 8000-8fff : PCI MMCONFIG [bus 00-ff] 8000-8fff : reserved 9000-900f : :00:16.0 9010-901f : :00:16.1 dd00-ddff : PCI Bus :08 dd00-ddff : :08:03.0 de00-de4f : PCI Bus :07 de00-de3f : :07:00.0 de47c000-de47 : :07:00.0 de60-de6f : PCI Bus :02 df00-df8f : PCI Bus :08 df00-df7f : :08:03.0 df80-df803fff : :08:03.0 df90-df9f : PCI Bus :07 dfa0-dfaf : PCI Bus :02 dfa0-dfa1 : :02:00.1 dfa0-dfa1 : igb dfa2-dfa3 : :02:00.0 dfa2-dfa3 : igb dfa4-dfa43fff : :02:00.1 dfa4-dfa43fff : igb dfa44000-dfa47fff : :02:00.0 dfa44000-dfa47fff : igb dfb0-dfb03fff : :00:04.7 dfb04000-dfb07fff : :00:04.6 dfb08000-dfb0bfff : :00:04.5 dfb0c000-dfb0 : :00:04.4 dfb1-dfb13fff : :00:04.3 dfb14000-dfb17fff : :00:04.2 dfb18000-dfb1bfff : :00:04.1 dfb1c000-dfb1 : :00:04.0 dfb2-dfb200ff : :00:1f.3 dfb21000-dfb217ff : :00:1f.2 dfb21000-dfb217ff : ahci dfb22000-dfb223ff : :00:1d.0 dfb22000-dfb223ff : ehci_hcd dfb23000-dfb233ff : :00:1a.0 dfb23000-dfb233ff : ehci_hcd dfb25000-dfb25fff : :00:05.4 dfffc000-dfffdfff : pnp 00:02 e000-fbff : PCI Bus :80 fbe0-fbef : PCI Bus :84 fbe0-fbe3 : :84:00.0 fbe4-fbe5 : :84:00.0 fbe6-fbe63fff : :84:00.0 fbf0-fbf03fff : :80:04.7 fbf04000-fbf07fff : :80:04.6 fbf08000-fbf0bfff : :80:04.5 fbf0c000-fbf0 : :80:04.4 fbf1-fbf13fff : :80:04.3 fbf14000-fbf17fff : :80:04.2 fbf18000-fbf1bfff : :80:04.1 fbf1c000-fbf1 : :80:04.0 fbf2-fbf20fff : :80:05.4 fbffe000-fbff : pnp 00:12 fc00-fcff : pnp 00:01 fd00-fdff : pnp 00:01 fe00-feaf : pnp 00:01 feb0-febf : pnp 00:01 fec0-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec4-fec403ff : IOAPIC 2 fed0-fed003ff : HPET 0 fed08000-fed08fff : pnp 00:0c fed1c000-fed3 : reserved fed1c000-fed1 : pnp 00:0c fed45000-fedf : pnp 00:01 fee0-fee00fff : Local APIC ff00- : reserved ff00- : pnp 00:0c 1-107fff : System RAM root@zeno:~# --- For comparison, output obtained (and reported previously) when machine was running PAE kernel: root@zeno:~# cat /proc/iomem - : reserved 0001-00099bff : System RAM 00099c00-0009 : reserved 000a-000b : PCI Bus :00 000a-000b : Video RAM area 000c-000d : PCI Bus :00 000c-000c7fff : Video ROM 000c8000-000cf5ff : Adapter ROM 000cf800-000d07ff : Adapter ROM 000d0800-000d0bff : Adapter ROM 000e-000f : reserved 000f-000f : System ROM 0010-7e445fff : System RAM 0100-01610e15 : Kernel code 01610e16-01802dff : Kernel data 0188-018b2fff : Kernel bss 7e446000-7e565fff : AC
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On Wed, Jan 16, 2013 at 1:22 AM, wrote: > Dear Sedat, > >> ... it really makes sense to switch to x86_64 >> (amd64) architecture when you have a modern computer. >> Switching makes even more sense when you have more than 4GiB RAM. > > You seem to say that one should switch to amd64 (if hardware allows), > even with less than 4GB RAM (where 32-bit non-PAE HIGHMEM4G kernel would > work fine), and that one should definitely switch with over 4GB RAM. > There would be no need or use for PAE kernels, which should be dropped. > > I think I agree. > [ OK, you took the thread from LKML to the Debian bug, anyway ] Where I see problems is the fact that you are more or less "forced" to switch to 64-bit. Why? (Read my thoughts below.) The bigger problem I am seeing is that as most developers decided to go the 64-bit way the 32-bit path is no more tested correctly. When I insisted to run a 32-bit system I fell over so much UNTESTED software. I talked with a lot of developers around the Linux kernel and Debian world and those guys - if you ask them in private - would drop 32-bit entirely. To be honest - I am speaking of the x86 world and remember also darkly issues in early MULTIAAARGH support on Debian. ( As an example: Building a gcc upstream release tarball (unpatched!) in an multiarch environment. Look for my bug-reports if you are interested in. ) If there exist no more 32-bit x86 veterans... The world will turn around - approximately the same way and speed :-). So if you want to concentrate on working, make your decisions carefully! ( Noone pays you for fixing all the time your working OS - saying that as a longterm Debian/sid user. ) Just from my experiences. Regards, - Sedat - > Thanks, Paul > > Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ > School of Mathematics and Statistics University of SydneyAustralia > > > --- > > Quoting in full for the benefit of 695...@bugs.debian.org : > >> From sedat.di...@gmail.com Tue Jan 15 21:26:14 2013 >> Date: Tue, 15 Jan 2013 11:25:41 +0100 >> Subject: Re: [RFC] Reproducible OOM with just a few sleeps >> From: Sedat Dilek >> To: paul.sz...@sydney.edu.au, Paul Szabo >> Cc: LKML , linux-mm , >> Ben Hutchings >> >> Hi Paul, >> >> I followed a bit the thread you started in [1]. >> >> As you might know i386 got eliminated in Linux-3.8. >> >> I had several discussions with the Debian kernel-team about the iN86 >> (N=4..6) and PAE kernel-flavours. >> On the one hand I can understand the reduction of linux-images >> especially for iN86. >> Even i486 is a bit unfirm as there is no much hardware around, but >> Debian will keep i486 for a while (release maintenance). >> >> Topic PAE: >> Unfortunately, I had a notebook with a Intel Centrino Banias CPU (no >> PAE) which should use the -486 kernel-flavour due to the Debian >> kernel-team. >> I played with some different kernel-setup which did not give me more >> benefit (openssl benchmarks etc.) >> The -686-pae kernel did run on my hardware, but as known with all the >> SMP-NO-OPs. >> >> Depending on the hardware, it really makes sense to switch to x86_64 >> (amd64) architecture when you have a modern computer. >> Switching makes even more sense when you have more than 4GiB RAM. >> IMHO using a -686-amd64 Debian kernel makes ZERO sense, real 64-Bit or die! >> >> I switched to 64-bit... and I switched from Debian/sid to >> Ubuntu/precise as well :-). >> ( NOTE: I am working here since April 2012 in a WUBI environment (no >> native Ubuntu Linux) :-). ) >> >> And I am building my kernels by myself. >> So I know very well whom to blame :-). >> >> Some last words: I had several fruitful or fruitless discussions with >> the Debian kernel-team, but I can confirm (with all my heart) this >> team makes a fantastic job. >> I can recommend you Ben's blog (recently I read a series about news in >> the Debian/wheezy kernel) if your world is Debian or Ubuntu (Debian != >> Ubuntu). >> >> Just my 0.02EUR (no British pound, here as well: when you are a member >> of the EU chose EUR not pound!). >> >> Regards, >> - Sedat - >> >> >> [1] http://marc.info/?t=13579617221&r=1&w=2 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/ca+iczuwmxsrkpqklocjfquxbd0kfv7spzewvb7pvhq6epwg...@mail.gmail.com
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Hi Paul, paul.sz...@sydney.edu.au wrote: > Dear Sedat, >> ... it really makes sense to switch to x86_64 >> (amd64) architecture when you have a modern computer. >> Switching makes even more sense when you have more than 4GiB RAM. > > You seem to say that one should switch to amd64 (if hardware allows), > even with less than 4GB RAM (where 32-bit non-PAE HIGHMEM4G kernel would > work fine), and that one should definitely switch with over 4GB RAM. > There would be no need or use for PAE kernels, which should be dropped. > > I think I agree. Yes, on 64-bit CPUs the 64-bit kernel is pretty much always the better choice. Unfortunately some existing systems use 32-bit CPUs. Hoping that clarifies, Jonathan -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130116004835.gh12...@google.com
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Sedat, > ... it really makes sense to switch to x86_64 > (amd64) architecture when you have a modern computer. > Switching makes even more sense when you have more than 4GiB RAM. You seem to say that one should switch to amd64 (if hardware allows), even with less than 4GB RAM (where 32-bit non-PAE HIGHMEM4G kernel would work fine), and that one should definitely switch with over 4GB RAM. There would be no need or use for PAE kernels, which should be dropped. I think I agree. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- Quoting in full for the benefit of 695...@bugs.debian.org : > From sedat.di...@gmail.com Tue Jan 15 21:26:14 2013 > Date: Tue, 15 Jan 2013 11:25:41 +0100 > Subject: Re: [RFC] Reproducible OOM with just a few sleeps > From: Sedat Dilek > To: paul.sz...@sydney.edu.au, Paul Szabo > Cc: LKML , linux-mm , > Ben Hutchings > > Hi Paul, > > I followed a bit the thread you started in [1]. > > As you might know i386 got eliminated in Linux-3.8. > > I had several discussions with the Debian kernel-team about the iN86 > (N=4..6) and PAE kernel-flavours. > On the one hand I can understand the reduction of linux-images > especially for iN86. > Even i486 is a bit unfirm as there is no much hardware around, but > Debian will keep i486 for a while (release maintenance). > > Topic PAE: > Unfortunately, I had a notebook with a Intel Centrino Banias CPU (no > PAE) which should use the -486 kernel-flavour due to the Debian > kernel-team. > I played with some different kernel-setup which did not give me more > benefit (openssl benchmarks etc.) > The -686-pae kernel did run on my hardware, but as known with all the > SMP-NO-OPs. > > Depending on the hardware, it really makes sense to switch to x86_64 > (amd64) architecture when you have a modern computer. > Switching makes even more sense when you have more than 4GiB RAM. > IMHO using a -686-amd64 Debian kernel makes ZERO sense, real 64-Bit or die! > > I switched to 64-bit... and I switched from Debian/sid to > Ubuntu/precise as well :-). > ( NOTE: I am working here since April 2012 in a WUBI environment (no > native Ubuntu Linux) :-). ) > > And I am building my kernels by myself. > So I know very well whom to blame :-). > > Some last words: I had several fruitful or fruitless discussions with > the Debian kernel-team, but I can confirm (with all my heart) this > team makes a fantastic job. > I can recommend you Ben's blog (recently I read a series about news in > the Debian/wheezy kernel) if your world is Debian or Ubuntu (Debian != > Ubuntu). > > Just my 0.02EUR (no British pound, here as well: when you are a member > of the EU chose EUR not pound!). > > Regards, > - Sedat - > > > [1] http://marc.info/?t=13579617221&r=1&w=2 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301160022.r0g0mdgj010...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Dave, >> ... What is unacceptable is that PAE crashes or freezes with OOM: >> it should gracefully handle the issue. Noting that (for a machine >> with 4GB or under) PAE fails where the HIGHMEM4G kernel succeeds ... > > You have found a delta, but you're not really making apples-to-apples > comparisons. The page tables ... I understand that the exact sizes of page tables are very important to developers. To the rest of us, all that matters is that the kernel moves them to highmem or swap or whatever, that it maybe emits some error message but that it does not crash or freeze. > There's probably a bug here. But, it's incredibly unlikely to be seen > in practice on anything resembling a modern system. ... Probably, I found the bug on a very modern and brand-new system, just trying to copy a few ISO image files and trying to log in a hundred students. My machine crashed under those very practical and normal circumstances. The demos with dd and sleep were just that: easily reproducible demos. > ... easily worked around by upgrading to a 64-bit kernel ... Do you mean that PAE should never be used, but to use amd64 instead? > ... Raising the vm.min_free_kbytes sysctl (to perhaps 10x of > its current value on your system) is likely to help the hangs too, > although it will further "consume" lowmem. I have tried that, it did not work. As you say, it is backward. > ... for a bug with ... so many reasonable workarounds ... Only one workaround was proposed: use amd64. PAE is buggy and useless, should be deprecated and removed. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301150216.r0f2gnyw022...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On 01/14/2013 12:36 PM, paul.sz...@sydney.edu.au wrote: > I understand that more RAM leaves less lowmem. What is unacceptable is > that PAE crashes or freezes with OOM: it should gracefully handle the > issue. Noting that (for a machine with 4GB or under) PAE fails where the > HIGHMEM4G kernel succeeds and survives. You have found a delta, but you're not really making apples-to-apples comparisons. The page tables (a huge consumer of lowmem in your bug reports) have much more overhead on a PAE kernel. A process with a single page faulted in with PAE will take at least 4 pagetable pages (it's 7 in practice for me with sleeps). It's 2 pages minimum (and in practice with sleeps) on HIGHMEM4G. There's probably a bug here. But, it's incredibly unlikely to be seen in practice on anything resembling a modern system. The 'sleep' issue is easily worked around by upgrading to a 64-bit kernel, or using sane ulimit values. Raising the vm.min_free_kbytes sysctl (to perhaps 10x of its current value on your system) is likely to help the hangs too, although it will further "consume" lowmem. I appreciate your persistence here, but for a bug with such a specific use case, and with so many reasonable workarounds, it's not something I want to dig in to much deeper. I'll be happy to answer any questions if you want to go digging deeper, or want some pointers on where to go looking to fix this properly. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/50f4a92f.2070...@linux.vnet.ibm.com
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On Tue, 2013-01-15 at 07:36 +1100, paul.sz...@sydney.edu.au wrote: > Dear Dave, > > >> Seems that any i386 PAE machine will go OOM just by running a few > >> processes. To reproduce: > >> sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600 & ((n=n+1)); done' > >> ... > > I think what you're seeing here is that, as the amount of total memory > > increases, the amount of lowmem available _decreases_ due to inflation > > of mem_map[] (and a few other more minor things). The number of sleeps > > you can do is bound by the number of processes, as you noticed from > > ulimit. Creating processes that don't use much memory eats a relatively > > large amount of low memory. > > This is a sad (and counterintuitive) fact: more RAM actually *CREATES* > > RAM bottlenecks on 32-bit systems. > > I understand that more RAM leaves less lowmem. What is unacceptable is > that PAE crashes or freezes with OOM: it should gracefully handle the > issue. [...] Sorry, let me know where to send your refund. Ben. -- Ben Hutchings Quantity is no substitute for quality, but it's the only one we've got. signature.asc Description: This is a digitally signed message part
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Dave, >> Seems that any i386 PAE machine will go OOM just by running a few >> processes. To reproduce: >> sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600 & ((n=n+1)); done' >> ... > I think what you're seeing here is that, as the amount of total memory > increases, the amount of lowmem available _decreases_ due to inflation > of mem_map[] (and a few other more minor things). The number of sleeps > you can do is bound by the number of processes, as you noticed from > ulimit. Creating processes that don't use much memory eats a relatively > large amount of low memory. > This is a sad (and counterintuitive) fact: more RAM actually *CREATES* > RAM bottlenecks on 32-bit systems. I understand that more RAM leaves less lowmem. What is unacceptable is that PAE crashes or freezes with OOM: it should gracefully handle the issue. Noting that (for a machine with 4GB or under) PAE fails where the HIGHMEM4G kernel succeeds and survives. >> On my large machine, 'free' fails to show about 2GB memory ... > You probably have a memory hole. ... > The e820 map (during early boot in dmesg) or /proc/iomem will let you > locate your memory holes. Thanks, that might explain it. Output of /proc/iomem below: sorry I do not know how to interpret it. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- root@zeno:~# cat /proc/iomem - : reserved 0001-00099bff : System RAM 00099c00-0009 : reserved 000a-000b : PCI Bus :00 000a-000b : Video RAM area 000c-000d : PCI Bus :00 000c-000c7fff : Video ROM 000c8000-000cf5ff : Adapter ROM 000cf800-000d07ff : Adapter ROM 000d0800-000d0bff : Adapter ROM 000e-000f : reserved 000f-000f : System ROM 0010-7e445fff : System RAM 0100-01610e15 : Kernel code 01610e16-01802dff : Kernel data 0188-018b2fff : Kernel bss 7e446000-7e565fff : ACPI Non-volatile Storage 7e566000-7f1e2fff : reserved 7f1e3000-7f25efff : ACPI Tables 7f25f000-7f31cfff : reserved 7f31d000-7f323fff : ACPI Non-volatile Storage 7f324000-7f333fff : reserved 7f334000-7f33bfff : ACPI Non-volatile Storage 7f33c000-7f365fff : reserved 7f366000-7f7f : ACPI Non-volatile Storage 7f80-7fff : RAM buffer 8000-dfff : PCI Bus :00 8000-8fff : PCI MMCONFIG [bus 00-ff] 8000-8fff : reserved 9000-900f : :00:16.0 9010-901f : :00:16.1 dd00-ddff : PCI Bus :08 dd00-ddff : :08:03.0 de00-de4f : PCI Bus :07 de00-de3f : :07:00.0 de47c000-de47 : :07:00.0 de60-de6f : PCI Bus :02 df00-df8f : PCI Bus :08 df00-df7f : :08:03.0 df80-df803fff : :08:03.0 df90-df9f : PCI Bus :07 dfa0-dfaf : PCI Bus :02 dfa0-dfa1 : :02:00.1 dfa0-dfa1 : igb dfa2-dfa3 : :02:00.0 dfa2-dfa3 : igb dfa4-dfa43fff : :02:00.1 dfa4-dfa43fff : igb dfa44000-dfa47fff : :02:00.0 dfa44000-dfa47fff : igb dfb0-dfb03fff : :00:04.7 dfb04000-dfb07fff : :00:04.6 dfb08000-dfb0bfff : :00:04.5 dfb0c000-dfb0 : :00:04.4 dfb1-dfb13fff : :00:04.3 dfb14000-dfb17fff : :00:04.2 dfb18000-dfb1bfff : :00:04.1 dfb1c000-dfb1 : :00:04.0 dfb2-dfb200ff : :00:1f.3 dfb21000-dfb217ff : :00:1f.2 dfb21000-dfb217ff : ahci dfb22000-dfb223ff : :00:1d.0 dfb22000-dfb223ff : ehci_hcd dfb23000-dfb233ff : :00:1a.0 dfb23000-dfb233ff : ehci_hcd dfb25000-dfb25fff : :00:05.4 dfffc000-dfffdfff : pnp 00:02 e000-fbff : PCI Bus :80 fbe0-fbef : PCI Bus :84 fbe0-fbe3 : :84:00.0 fbe4-fbe5 : :84:00.0 fbe6-fbe63fff : :84:00.0 fbf0-fbf03fff : :80:04.7 fbf04000-fbf07fff : :80:04.6 fbf08000-fbf0bfff : :80:04.5 fbf0c000-fbf0 : :80:04.4 fbf1-fbf13fff : :80:04.3 fbf14000-fbf17fff : :80:04.2 fbf18000-fbf1bfff : :80:04.1 fbf1c000-fbf1 : :80:04.0 fbf2-fbf20fff : :80:05.4 fbffe000-fbff : pnp 00:12 fc00-fcff : pnp 00:01 fd00-fdff : pnp 00:01 fe00-feaf : pnp 00:01 feb0-febf : pnp 00:01 fec0-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec4-fec403ff : IOAPIC 2 fed0-fed003ff : HPET 0 fed08000-fed08fff : pnp 00:0c fed1c000-fed3 : reserved fed1c000-fed1 : pnp 00:0c fed45000-fedf : pnp 00:01 fee0-fee00fff : Local APIC ff00- : reserved ff00- : pnp 00:0c 1-107fff : System RAM root@zeno:~# -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debi
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
On 01/11/2013 07:31 PM, paul.sz...@sydney.edu.au wrote: > Seems that any i386 PAE machine will go OOM just by running a few > processes. To reproduce: > sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600 & ((n=n+1)); done' > My machine has 64GB RAM. With previous OOM episodes, it seemed that > running (booting) it with mem=32G might avoid OOM; but an OOM was > obtained just the same, and also with lower memory: > Memorysleeps to OOM free shows total > (mem=64G) 5300 64447796 > mem=32G 10200 31155512 > mem=16G 13400 14509364 > mem=8G14200 6186296 > mem=6G15200 4105532 > mem=4G16400 2041364 > The machine does not run out of highmem, nor does it use any swap. I think what you're seeing here is that, as the amount of total memory increases, the amount of lowmem available _decreases_ due to inflation of mem_map[] (and a few other more minor things). The number of sleeps you can do is bound by the number of processes, as you noticed from ulimit. Creating processes that don't use much memory eats a relatively large amount of low memory. This is a sad (and counterintuitive) fact: more RAM actually *CREATES* RAM bottlenecks on 32-bit systems. > On my large machine, 'free' fails to show about 2GB memory, e.g. with > mem=16G it shows: > > root@zeno:~# free -l > total used free sharedbuffers cached > Mem: 14509364 435440 14073924 0 4068 111328 > Low:769044 120232 648812 > High: 13740320 315208 13425112 > -/+ buffers/cache: 320044 14189320 > Swap:134217724 0 134217724 You probably have a memory hole. mem=16G means "give me all the memory below the physical address at 16GB". It does *NOT* mean, "give me enough memory such that 'free' will show ~16G available." If you have a 1.5GB hole below 16GB, and you do mem=16G, you'll end up with ~14.5GB available. The e820 map (during early boot in dmesg) or /proc/iomem will let you locate your memory holes. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/50f41d9d.1000...@linux.vnet.ibm.com
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Reported to Ubuntu also: PAE regression: OOM with just a few sleeps https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098961 Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301122020.r0ckk04m018...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
The issue is a regression with PAE, reproduced and verified on Ubuntu, on my home PC with 3GB RAM. My PC was running kernel linux-image-3.2.0-35-generic so it showed: psz@DellE520:~$ uname -a Linux DellE520 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:45:18 UTC 2012 i686 i686 i386 GNU/Linux psz@DellE520:~$ free -l total used free sharedbuffers cached Mem: 3087972 6922562395716 0 18276 427116 Low:861464 71372 790092 High: 2226508 6208841605624 -/+ buffers/cache: 2468642841108 Swap: 2920 258364 19742556 Then it handled the "sleep test" bash -c 'n=0; while [ $n -lt 33000 ]; do sleep 600 & ((n=n+1)); ((m=n%500)); if [ $m -lt 1 ]; then echo -n "$n - "; date; free -l; sleep 1; fi; done' just fine, stopped only by "max user processes" (default setting of "ulimit -u 23964"), or raising that limit stopped when the machine ran out of PID space; there was no OOM. Installing and running the PAE kernel so it showed: psz@DellE520:~$ uname -a Linux DellE520 3.2.0-35-generic-pae #55-Ubuntu SMP Wed Dec 5 18:04:39 UTC 2012 i686 i686 i386 GNU/Linux psz@DellE520:~$ free -l total used free sharedbuffers cached Mem: 3087620 6811882406432 0 167332 352296 Low:865208 214080 651128 High: 412 4671081755304 -/+ buffers/cache: 1615602926060 Swap: 2920 0 2920 and re-trying the "sleep test", it ran into OOM after 18000 or so sleeps and crashed/froze so I had to press the POWER button to recover. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301121941.r0cjf5ps017...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Linux-MM, Seems that any i386 PAE machine will go OOM just by running a few processes. To reproduce: sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600 & ((n=n+1)); done' My machine has 64GB RAM. With previous OOM episodes, it seemed that running (booting) it with mem=32G might avoid OOM; but an OOM was obtained just the same, and also with lower memory: Memorysleeps to OOM free shows total (mem=64G) 5300 64447796 mem=32G 10200 31155512 mem=16G 13400 14509364 mem=8G14200 6186296 mem=6G15200 4105532 mem=4G16400 2041364 The machine does not run out of highmem, nor does it use any swap. Comparing with my desktop PC: has 4GB RAM installed, free shows 3978592 total. Running the "sleep test", it simply froze after 16400 running... no response to ping, will need to press the RESET button. --- On my large machine, 'free' fails to show about 2GB memory, e.g. with mem=16G it shows: root@zeno:~# free -l total used free sharedbuffers cached Mem: 14509364 435440 14073924 0 4068 111328 Low:769044 120232 648812 High: 13740320 315208 13425112 -/+ buffers/cache: 320044 14189320 Swap:134217724 0 134217724 --- Please let me know of any ideas, or if you want me to run some other test or want to see some other output. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia - Details for when my machine was running with 64GB RAM: In another window I was running cat /proc/slabinfo; free -l repeatedly, and output of that (just before OOM) was: + cat /proc/slabinfo slabinfo - version: 2.1 # name : tunables: slabdata fuse_request 0 0376 434 : tunables000 : slabdata 0 0 0 fuse_inode 0 0448 364 : tunables000 : slabdata 0 0 0 bsg_cmd0 0288 282 : tunables000 : slabdata 0 0 0 ntfs_big_inode_cache 0 0512 324 : tunables000 : slabdata 0 0 0 ntfs_inode_cache 0 0176 462 : tunables000 : slabdata 0 0 0 nfs_direct_cache 0 0 80 511 : tunables000 : slabdata 0 0 0 nfs_inode_cache 28 28584 284 : tunables000 : slabdata 1 1 0 isofs_inode_cache 0 0360 454 : tunables000 : slabdata 0 0 0 fat_inode_cache0 0408 404 : tunables000 : slabdata 0 0 0 fat_cache 0 0 24 1701 : tunables000 : slabdata 0 0 0 jbd2_revoke_record 0 0 32 1281 : tunables000 : slabdata 0 0 0 journal_handle 4080 4080 24 1701 : tunables000 : slabdata 24 24 0 journal_head1024 1024 64 641 : tunables000 : slabdata 16 16 0 revoke_record768768 16 2561 : tunables000 : slabdata 3 3 0 ext4_inode_cache 0 0584 284 : tunables000 : slabdata 0 0 0 ext4_free_data 0 0 40 1021 : tunables000 : slabdata 0 0 0 ext4_allocation_context 0 0112 361 : tunables00 0 : slabdata 0 0 0 ext4_prealloc_space 0 0 72 561 : tunables000 : slabdata 0 0 0 ext4_io_end0 0576 284 : tunables000 : slabdata 0 0 0 ext4_io_page 0 0 8 5121 : tunables000 : slabdata 0 0 0 ext2_inode_cache 0 0480 344 : tunables000 : slabdata 0 0 0 ext3_inode_cache1467 2079488 334 : tunables000 : slabdata 63 63 0 ext3_xattr 0 0 48 851 : tunables000 : slabdata 0 0 0 dquot168168192 422 : tunables000 : slabdata 4 4 0 rpc_inode_cache 108108448 364 : tunables000 : slabdata 3 3 0 UDP-Lite 0 0576 284 : tunables000 : slabdata 0 0 0 xfrm_dst_cache 0 0320 514 : tunables000 : slabdata 0 0 0 UDP 336336576 284 : tunables000 : slabdata 12 12 0 tw_sock_TCP 32 32128 321 : tun